Solved

SERIOUSLY stumped - server 2008 running SQL 2012 just becomes VERY slow...

Posted on 2014-12-19
27
258 Views
Last Modified: 2016-11-23
Ok, here is my story. If someone can help me solve this I will seriously send you a bunch of pizzas and award you any points I can.

We have a client that runs a mission critical server - server 2008 R2 with SQL 2012 as the database for the application. They are both one server now because of this problem.

It seems that somewhere near the middle of each month the server slows to a crawl and SQL cannot finish queries and the web app it runs just stops working.  We tried everything with the software vendor (their software uses SQL, they setup sql and the software and we have a very expensive support contract with them).  Had dell scan the systems, held it in stasis with no updates or changes for a month, poured through even logs, restarted all services, restarted the server and so on.. the ONLY thing that will fix it is shutting down the server, unplugging the server for 10 minutes and booting it back up.  Then its super fast and everything runs again.

while this problem is happening processor utilization is up, at about 50 percent, the server slows down to half speed (EVERYTHING runs slow) and SQL slows to a crawl.  There are no special scripts, queries, anything like that running that we can see or know about.

- previous to this setup, the server software and SQL server were on 2 new separate physical servers (everything is physical, no virtual)

- when this problem came along oringally, vendor, IT nobody could figure it out, didn't know the power off trick yet so only option was a complete re-install of the software and sql on another server together. problem comes back a couple months later on this completely different physical server

- at least now we can fix it but it seriously disrupts the company and everyone freaks out

HELP!

Thanks
0
Comment
Question by:markgal26
  • 13
  • 10
  • 2
  • +1
27 Comments
 
LVL 3

Expert Comment

by:Gary Fuqua, CISSP
ID: 40510410
Is it a raid configuration?  Run raid diagnostics on boot up if it is.  Make sure all drives are running properly.
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40510639
Sorry to hear this unfortunate frequent situation you are facing.  

Your system requires performance tuning with advanced skills in SQL Server.  For instance, if you experience periodic slowdowns that usually indicates that you have a few periodic processes that your subsystem can not cope with.  the frequent causes are the following:
> IO subsystem contention: your IO subsystem can not satisfy the mo overhead process.  The solution resides in increasing your IO subsystem throughput in MB/sec or decreasing the required throughput.
> Blocking Locks: the monthly process you have cause locks on objects that are in conflict by other applications.  The solution reside avoiding locking at the price of extra space consumed.
> CPU contention: the monthly process: the system experience slowdown because the CPU's are up the roof.  The solution is reducing CPU footprint by having the right indexes if your their party vendor accepts you modify the schema.  If you can not reduce the CPU footprint, you will have to use new CPU or distribute your database among multiple servers.

Now saying more is lengthy and won't help effectively.  

Since I love pizzas, here is the deal:
> Set up a secure VPN canal to your server with a sysadmin login and a local admin login.
> I take a quick look and let you know what I see for free.
> I recommend actions you implement.
> You are happy and send me pizzas.

This is a helping hand in the spirit of EE.  I do not expect any payment for this.  Let me know what you think.

On the same time, try to identify the processes that run periodically and start setting up the metrics to see what happens during the slowdown:
> IO Metrics: Performance Monitor --> Avg Sec/Write Avg Sec/Read on the physical volume where data resides
> Locks: Use DMV
> CPU Metrics: Performance Monitor
> To isolate processes: DMV and Profiler.

Hope this helps.
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40510645
<<- previous to this setup, the server software and SQL server were on 2 new separate physical servers (everything is physical, no virtual)>>
Go back to physical.  Virtualizing SQL Server on is never a good idea.  As a friend of mine puts it, there is nothing better than SQL Server to virtualize SQL Server.

<<- when this problem came along oringally, vendor, IT nobody could figure it out, didn't know the power off trick yet so only option was a complete re-install of the software and sql on another server together. problem comes back a couple months later on this completely different physical server>>
No.  Installing binaries has impact on performance only in exceptional case of know bugs.  Performance is a physical matter of having a process requiring a quantity of resources the system does not have at a certain point in time.

<<- at least now we can fix it but it seriously disrupts the company and everyone freaks out>>
You should tell that to the person who imposed the third party vendor who designed an non performant database application.  He is the primary responsible for insuring his application works performance wise.  Don't you have some third party evaluation process ?
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40510647
Yeah and please try to increase your CPU pool to see if that has any effect.
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40510649
For CPU hogs (credits to Pinal Dave), run the following
SELECT TOP 10 SUBSTRING(qt.TEXT, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(qt.TEXT)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)+1),
qs.execution_count,
qs.total_logical_reads, qs.last_logical_reads,
qs.total_logical_writes, qs.last_logical_writes,
qs.total_worker_time,
qs.last_worker_time,
qs.total_elapsed_time/1000000 total_elapsed_time_in_S,
qs.last_elapsed_time/1000000 last_elapsed_time_in_S,
qs.last_execution_time,
qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
ORDER BY qs.total_logical_reads DESC

Open in new window

0
 

Author Comment

by:markgal26
ID: 40510673
Thanks all! Reading through everything now. Everything is and always was on physical servers. We just went from 2 physical servers to one.  The vendor is a big enough Company with lots of clients in our sector. We were running the previous version of the same software on way older servers with no problems. This occurred after an upgrade last year with their new software and our new servers way over spec, they are powerful and then some. The problem then occurred on new sql server and followed us from one server to another. The vendor with many pros tried for a full day to figure out and fix before we opted for a full reinstall.  we thought it was the hardware so we moved off the one sql server. After that when it came back on the new one we discovered the power off trick and realized 2 different servers were likely not having the same exact issue. Don't think its raid or other hardware for this reason we scanned all hardware anyways .

They key thing I think is that ONLY killing power to the box for 5 minutes fixes it. A reboot brings it back up in the exact same slow non working state.

Going to read al replies again. Thx all
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40510675
My money is on the IO Hypervizor.  Good luck with your inquiries.
0
 

Author Comment

by:markgal26
ID: 40510686
Racimo - I'm no virtual pro, so correct me if I"m wrong - but these are physical servers, nothing is virtual.  Would the hypervisor role alone be affecting things? Should I remove it? Or are you thinking something is virtual and making that comment? thanks
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40510691
<<Would the hypervisor role alone be affecting things?>>
Perhaps I misunderstood but is your current installation of SQL Server hosted on a virtual machine ?  

My point is that a virtual machine IO subsystem performance depends directly of a software IO buffer pool organized by the IO hypervizor.  So if you experience slowdown due to a migration from a physical machine to a virtual machine, that is often due to hypervizor playing a big part in CPU and IO resources for SQL Server.

OTOH, if SQL Server is hosted only on a physical host then, you can obviously ignore may last comment and move on with the suggestions made previously (Metrics, DMV, Profiler).

Hope this clarifies...
0
 

Author Comment

by:markgal26
ID: 40510697
Thanks! yes, everything is physical no virtual at all

original configuration: 1 physical server running SQL ---- 1 physical server running vendor software
after we had problems with the SQL server they re-installed their software AND SQL on one physical box as we though the original sql box was having hardware issues
so new config is 1 physical server running SQL and vendor software together

shouldn't a simple reboot fix some of the performance issues at least temporarily? It's the powering off completely = fix that has me banging my head up against the wall, what is being "held" in memory during a simple reboot?

I will also look more closely at your other suggestions

thx
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40510726
<<shouldn't a simple reboot fix some of the performance issues at least temporarily? I>>
No.  In the best case, a reboot only postpones the problem but does not fix it by either clearing up cache resources or reinitliazing priorities in processes.  This can be often achieved without even rebooting.  Rebooting is not a solution.

<<l, what is being "held" in memory during a simple reboot?>>
You need at what is inside the SQL procedure cache before rebooting to answer precisely this question.

<<I will also look more closely at your other suggestions>>
These are not mere suggestions: this is something I have done for hundred of similar situations to solve the problem.  This is what you need to do to inquire efficiently and take effective action.  As a reminder:

> Using Performance Monitor evaluate Physical Disk Avg Sec /Write and Physical Disk: Avg Sec/Read
> Using DMV, evaluate the most CPU consuming processes.  You simply copy and paste the code provided to have a better idea.  Post the results on this thread so that we may help.
> Look at the procedure cache also to see if you do not have memory pressure (especially if using adhoc queries)

Further than that I am afraid I can not help you further.
0
 

Author Comment

by:markgal26
ID: 40510751
Thanks I will try this.

I didn't mean that rebooting is a permanent solution, of course its not, I was simply showing that:

simple reboot = server comes p and is still immediately having all the same issues (so it does not even alleviate it temporarily)

Power off and unplug for 5 minutes = the problem is GONE on reboot (until it comes back about a month or so later).

I was wondering what isn't happening during a reboot that IS happening during a full power down, why would one give us temporary relief and not the other.

I will start working with your suggestions and report back
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40510754
<<I was wondering what isn't happening during a reboot that IS happening during a full power down, why would one give us temporary relief and not the other.>>
I hear you.

Again, many things may explain what you are telling us once the server reboots (memory pressure, procedure cache, transaction priority), and there is only one way to know for sure I described(looking in the procedure cache).  

But my point is that you will spend more time focusing on the reboot relief than doing what I suggested.  I am trying to get you to the shortest path to avoid you further punishment from SQL Server.  Really up to you.  I am not having any other intention than helping you solve your problem quickly.

Based on the last suggestions, let us know what you have.  Good luck with the inquiry.
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40510761
Further questions that you can answer and help us help you...

How much memory do you have ?  What is the server used for ?  What is the database size ?  How many connections /sec ?

Regards...
0
 
LVL 23

Assisted Solution

by:Racim BOUDJAKDJI
Racim BOUDJAKDJI earned 300 total points
ID: 40510763
Also please post SQL Server Logs as well as System Logs at time before reboot.  

Here is a summary of the info we need:

> Performance Monitor IO and CPU metrics
> DMV
> SQL Server Logs
> System Logs
> Additional Info: Memory used, Usage type, Connections/sec.

Don't worry we will figure this out together.
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40511406
Any updates on this ?

Let us know what you have.
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
ID: 40513044
Can you provide more information?
What was the 2 servers solution configuration (Hardware, 32b or 64b architecture?, CPU, RAM, OS version, MSSQL version)?
What is the actual server solution configuration (Hardware, 32b or 64b architecture?, CPU, RAM, OS version, MSSQL version)?
The only thing that could be solved from moving to a single server solution was a network problem but since the issue still occurring I think that for now it's something that we can put in the side.
0
 

Author Comment

by:markgal26
ID: 40513343
Hi

Thanks for the replies guys -

Racimo: its not occurring now so I can't test right now but I will and report back

Vitor: all 64 bit OS 2008 R2 and sql 2012, I can grab the specs but xeon processors, 6 cores, 32 GB RAM each

we initially moved as we thought the SQL server itself was having hardware issues and no matter what anyone did (reboots included) the issue would persist.  It wasn't until we were moved to one server and the issue came back that we discovered powering down and unplugging would fix it for about 30 days
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40513357
<< its not occurring now so I can't test right now but I will and report back>>
Perhaps I was not clear enough : you do not have to collect these metrics when the problem occurs.  The values you observe can tell us a course of action to proactively prevent this problem from occurring again.    

I hope I make sense now.

Hope this helps.
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
ID: 40514434
Did you run any SQL Profiler to capture SQL statements, locks and errors?
0
 

Author Comment

by:markgal26
ID: 40531962
Hi all - I haven't forgotten about this and appreciate the help.

I'll be running the diags this weekend and post back the results
0
 

Accepted Solution

by:
markgal26 earned 0 total points
ID: 40546792
HI Everyone

Ok here is what we have so far.. i was gathering data using your suggested methods and it happened again so I had to power it all down completely and unplug.  Wouldn't you know a completely NEW server with a different OS, no SQL running exchange 2013 did the SAME THING shortly after, to a T.  processor pegged, running really really slow for no reason at all. (these are two physical servers, no VM, new with lots of processing power and RAM).

Unplugged it and did same thing, fixed it, back running at blazing speeds.

We decide to move the servers out of the server room and put in our IT area (off the UPS and off the circuits in the server room) and haven't had any issues since.

We are going to start to moving things around and testing the voltage of each area as well as all the UPS units but we THINK this is all power related (has happened to 3 servers all in the same area of our server room, all fairly new)

Thanks for all your help so far, how should I handle this question from here?
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 40547213
<<Thanks for all your help so far, how should I handle this question from here?>>
If you believe we helped in one way or another, share points else simply ask for the question to be closed and your points refunded.

Most importantly is that your problem did get a fix.
0
 

Author Closing Comment

by:markgal26
ID: 40561607
It appears to be a power issues not addressed in any of the proposed solutions
0
 

Author Comment

by:markgal26
ID: 40742016
I just want to add another notes to this in case someone else comes across this.  THIS IS THE ISSUE, power all along:

http://serverfault.com/questions/641212/unexplicable-extreme-slowness-on-dell-poweredge-r320-fixed-only-by-cold-reboot

I'll paste in the content in case the link goes dead:

At this customer's site, there are two new Dell PowerEdge R320 servers with the following configuration:

----------------------------------------------------------------------

A single 6-core CPU
16 GB of RAM
2x500 GB SATA disks in a RAID 1 array
The O.S. is Windows Server 2012 R2, used as a Domain Controller; all firmwares and drivers are up to date, and Windows is fully patched; the system load is usually very low.

All of a sudden, one of the servers slowed down to a crawl. And by "crawl", I mean "it wasn't even able to paint a window in a decent time". Doing anything at all, even right-clicking and showing up the contextual menu, even moving the cursor around, was an excruciating pain.

There was no unusual load on the server: CPU usage was 1-3%, RAM usage below 4 GB, no disk or network peaks, nothing at all.

There were also no errors whatsoever in any Windows event log (when we finally managed to open it), and the slowness didn't cease when the network cable was disconnected.

Rebooting Windows was useless, too: after a very long boot time, the system remained awfully slow as before.

Last but not least, there were no error messages either on the system's front panel display, or on the screen during the POST.

As a last resort, we decided to try a cold boot, and actually disconnected the power cables before restarting the server. This fixed the problem: the system booted normally and resumed full performance.

However, the question remains: what happened here?!?

And, more important: how can we make sure it won't happen again?

performance dell dell-poweredge windows-server-2012-r2
shareimprove this question
edited Oct 31 '14 at 18:29

asked Oct 31 '14 at 18:10

Massimo
38k1894195
1             
First thing that comes to my mind would be running the Dell diagnostic tools to see if there's a hardware fault or impending failure. –  HopelessN00b Oct 31 '14 at 18:18
               
Have you tried running the F2 Diagnostics, your ECC ram could be having issues. –  zillabunny Oct 31 '14 at 18:24
               
Had the same issue with an R320 - windows rebooted from a small batch of 8 Windows updates and the server rebooted but was extremely slow and my Exchange 2013 services including the Exchange RPC service wouldn't start. Warm reboots didn't change anything and events logs were ambiguous and slow to read so it appeared as if the Windows Updates were the cause. Luckily, I took a step back and researched the slow issues with Win 2012 and Dell R320 and found your post. A cold boot and clearing the power fixed the speed issues and the Exchange services all started correctly. I could only conclude tha –  Terrence Apr 16 at 0:22
add a comment
1 Answer
activeoldestvotes
up vote
1
down vote
Had identical problem, after examining DSET logs while issue was present and then after cold boot fix, Dell support claimed power surge, server powered by APC 1500kVA SmartUPS at the time.

Dell support recommended cold boot to reset sensors (power unplugged, hold down power button for more than 3 seconds).

Support also suggested patching iDrac to latest available 1.66.65 either through Lifecycle (requires reboots) or from Windows system (does not require reboot).

This happened a few weeks ago during first week of January 2015, problem has not returned.

ESM_Firmware_3F4WV_WN64_1.66.65_A00.EXE

Dell PowerEdge R320 6-core CPU 24 GB of RAM 2 x 1000 GB NLSAS disks RAID 1
0
 

Author Comment

by:markgal26
ID: 41418325
And do add on to this (I'm OP), this ultimately solved our problem:

updating to the latest bios..when the server came back from reboot, no unplug required. was fast and perfect again
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Moving your enterprise fax infrastructure from in-house fax machines and servers to the cloud makes sense — from both an efficiency and productivity standpoint. But does migrating to a cloud fax solution mean you will no longer be able to send or re…
OfficeMate Freezes on login or does not load after login credentials are input.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now