Link to home
Start Free TrialLog in
Avatar of Eric Jack
Eric JackFlag for United States of America

asked on

Windows Server 2008 R2 has become super-slow, unusable. Can't find cause.

I have a Windows Server 2008 R2 application server, running on my VMware 4.0 environment, and it is giving me heaps of problems and I can't figure it out. Basically, the server's performance has been deteriorating over the past several months to the point now where has pretty much become unusable, and the applications running on it no longer work.

This server's duties are:
Symantec Endpoint Protection Manager
Print server
FLEXform Print Adapter
VMware vCenter Server
FlexLM (AutoCAD 2015 license manager)
ACT! Server

It's gotten to the point where the SEP won't update the clients, people can't print, or prints take forever, and users aren't able to sync their ACT! databases. When I'm able to log into the server, the task manager shows the CPU pegged at 100% most of the time and the memory usage stuck at about 80%.

This server has never been speedy, and at the moment I've been waiting 30 minutes to log into it. Last time I checked (last night), I couldn't even get into the Event Viewer due to some error with snap-ins. It took me two hours to get all the services up and running correctly last night after a reboot, which no longer seems to help at all. I'm at a loss at where to even start troubleshooting this right now...
Avatar of Temody
Temody
Flag of Egypt image

Check task manager process tap
To define what's going on
Avatar of jhyiesla
You don't say what you CPU and memory resources are, but, depending on how big the VMware environment is, you may be over taxing the server. Typically we make vCenter it's own server. While it's possible to combine tasks on servers, we tend to isolate larger applications on their own servers.  The above suggestion about task manger is a good idea as it will give you an idea of what's using up what.
Which proceed take 80%
Avatar of Eric Jack

ASKER

@jhyiesla - Yeah, sorry I forgot to include that info. I was in a rush to get the initial post up because I had to run to a meeting with a vendor. I can thank the Managed Service Provider who built the VM environment for putting vCenter on my app server. I would have preferred it on another box myself. But I am to blame for putting ACT! and SEP on the same server, as I was trying to reduce management/monitoring costs. However, things did at least run before, even if slowly.

The real kicker is I'm getting quotes from providers to rebuild my VM environment on new hosts and with VMware 6 plus upgrade the servers to Win 2012, and this *$@& server couldn't behave for just a few weeks longer!

Here's the specs:
The pair or host servers are identical Dell R510s with dual quad-core Xenon E5620 2.4GHz processors, 64GB of memory, and using 1.5TB of storage on an old EMC Clarrion.

It looks like the app server is configured to use 1 vCPU and 16GB of memory. The storage allocated to the server is allocated into two drives (C and E), both of which have about 50GB free.

@Temody - When I look at Task Manager, Processes tab, I don't see what's causing the CPU overload. When sorting by the CPU column, nothing particular stays on top for more than a moment. However, the top users of memory are:
- tomcat6.exe 486,884 K
- svchost.exe 28X,XXX K
- SemSvc.exe *32 242,000 K
- sqlservr.exe *32 201,712 K
It looks like you're over driving the memory on the VM and probably the speed issue is from the paging.

While 16 GB is more than enough for just a vCenter (4 requires about 3 GB minimum) adding in the other things, especially a SQL server is probably just over taxing it.

And, depending on your environment, 2 hosts at 64 GB each could also be a little on the light side.

If you have it averrable, you might try adding more RAM to this server VM to see if that helps. You could also potentially go into SQL server and limit the amount of RAM that it can use.  SQL will basically take all that it can.
@jhyiesla - I agree the server is probably over-taxed. ACT! is pretty much known to be a dog all on its own. And I don't think SEP is a lightweight either. Plus printing, vCenter, and...

With the VMware environment getting upgraded in the next 6-8 weeks, all I need to do is slap a big enough band-aid on this app server to get it working until I can upgrade it. When I do upgrade, I'll break out each application to it's own VM. Which also means I'm not going to go through the expense of upping the memory on the hosts, as new hosts will be installed post haste.

I did double the memory on this VM from 8 to 16 several months ago, which didn't really seem to help. Should I go ahead and bump it up to 32?

Regarding SQL, I thought I did limit its memory usage, but perhaps I did it wrong? What's the correct way to check/change that?
If you have the resources, upping it to 32 can't hurt.  Unfortunately I'm not a SQL guy so I don't know the correct way to do that right off; I just know that it's possible.
I updated the resource allocation on this VM. I bumped the memory from 16GB to 32GB and I also increased from 1 CPU to 4! After booting up again, I found the CPU usage was pegged at 100% on all 4 CPUs, and the memory usage was 70-90%.

Thinking this might be Symantec Endpoint Protection, since I've had some trouble with it, I disabled all its services and rebooted again. Upon boot-up, I found the memory usage was a much more reasonable 20-30%, but over time started to creep up again (memory leak somewhere?) However, the CPU was still pegged at 100%.

Next I ran MSConfig and specified a basic startup only. Then restarted. Upon boot-up, I found the CPU still pegged at 100% and the memory at 6%. Even though there are now only 17 processes and 7 services running (bare minimum.) The System process seems to be the top CPU user, as I type this, in the 90% range!

So... does this mean some inherent part of Windows Server 2008 R2 itself is what's buggered up!? I'm not sure what to look at next. I can't seem to narrow it down to an application that I can fix/remove/reload.
Look at this link and see if any of it helps.  I doubt that 2008 itself is the issue, but some of these troubleshooting techniques might help.

http://blogs.technet.com/b/yongrhee/archive/2009/08/07/how-to-troubleshoot-high-cpu-in-the-system-process.aspx
That's the article I used which gave me the idea to run MSConfig and boot to a basic boot-up. Which is still pegged at 100% CPU usage. The problem is the article references some tools I'm not familiar with, and the troubleshooting steps boil down to basically saying "try these tools if you still have problems after a clean boot with MSConfig."
On a side note: I just logged into the server now to continue trouble-shooting and the CPU is behaving! It's still booted up to the basic boot via MSConfig. However, the memory usage is 15GB+ and slowly climbing! Which seems unusual for a server that isn't doing anything.
I think you said that some  version of SQL is installed.  SQL is a memory hog and will use whatever you give it, but it usually doesn't affect CPU usage as badly.
Would SQL have been running after a "clean" MSConfig boot?
My guess is yes... And unless you have it installed to run as a real SQL server, it's probably being used by some other application  that needs a DB.
Yeah, Symantec Endpoint Protection uses a built-in version of SQL, and ACT! also uses SQL Express, I believe. Since SEP is still shut down at the service level, I'm assuming it wasn't it. But the SQLE that ACT! uses may still have been gobbling memory.

Here's something Really Interesting (c), I rebooted the server back into normal mode and... it's behaving! At this very moment, CPU usage across the 4 CPUs is down to 0-20%, and even when I first booted it was never fully pegged at 100%. Memory usage is a solid 72%, but we've already established SQL's hunger.

Now I love it when things fix themselves, but I hate not knowing what the problem was. All I did was use MSConfig to boot the server into a "diagnostic startup" which only loads the basic devices and services. The CPU usage was pegged at 100%. I let the server run that way over night for 15 hours, and when I checked this morning, the CPU was acting normally again. Then I rebooted it normally, and it's been behaving since! The only thing I need to do is start up the SEP services again so the clients are getting AV updates once more.

I can't really do more troubleshooting if the problem is not active. Is it possible that Windows was trying to load/update/fix something and couldn't because of everything else running, and having an overnight to run in a sort of "safe mode" allowed it to complete, now making the server better?

Bah! I hate inconclusive solutions!
I'm with you... it's great that it fixed itself, but not knowing why means that it might happen again.  I suppose it's possible that Windows was trying to do something, but about the only thing that I see that even begins to give me issues is when its installing updates, and even then it's not this bad.

Overall, this server is just too overloaded. Moving to more hosts and more resources as well as splitting out your functions will most likely help immensely. It could have been a combination of things happening at the same time that have never happened before and one or more of them is done. My only other suggestion is to look through the event logs during that time for any errors that have occurred and see if they match up with times you notice the issue.
The event logs have always been inconclusive. At least in my understanding.

I've restarted SEP and sure enough the CPU usage has skyrocketed. It's not the solid 100% slam it was before, but I suppose this is expected since SEP is now trying to get updates out to all the clients. As long as I don't see a solid line at 100%, I'm happier than I was. All I need is for this sever to survive a few more weeks.

My goal is to rebuild my VMware and server infrastructure. This project is just beginning and I'm getting quotes for hardware, software and services. My intention is to build new Windows Server 2012 VMs and move the applications to these fresh servers (DC, Exchange, apps, etc.) And yes, I intend to create separate VMs for each main function... ACT! and SEP will wind up on different servers!
Yes, AV products will take over a system when they're trying to update everyone, so you are correct in that what you are seeing is "normal". Reworking the Vm's will definitely help things.
ASKER CERTIFIED SOLUTION
Avatar of Eric Jack
Eric Jack
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Problem was never resolved. I can't close the thread without picking something. Not sure what else to do...