Solved

Win 2008 R2 File Server Latency Issues

Posted on 2014-01-07
7
617 Views
Last Modified: 2014-01-08
First let me say this issue has yet to produce an event in event viewer.

We have a single Windows 2008 R2 File Server (VM in ESXi 5.1 in a VSA redudant environment) that has a problem.  It's only happened 7 times since last May 2013, but it's of a huge concern because it halts the entire company.  It also doesn't affect any other of the 8 VM we have in the VSA configuration or 7 VM in the vCenter (not in VSA).

The issue is that the file server slowly comes to a halt.  The latency issue starts off only affecting a few users and then escalates to the point where the server is non responsive at the console, but will service remote requests in 3-5 minutes.  The symptoms take anywhere from 3-5 hours to first rear their head to bringing the company to a grinding halt. (File Server is very important to us)

A reboot of the server immediately fixes the problem, however, we also have folder redirection turned on (stored on this server) for appdata roaming, desktop, and favorites.  Reboot of the file server also requires a reboot of all the users workstations, about 60 or so.

The file server has only File Services and FSRM installed on the device, but the problem was occuring before FSRM

Management wants an explanation and resolution and I basically have no idea where to start.  There's no logs, no events, and we simply do not currently have a third party monitoring tool that would record these happenings for review.  

VMWare ESXi reports no unusual service requests times in diskIO, network, CPU, or RAM usage of the machine during these times.  

Time of day has been anywhere during working hours, morning, afternoon, and right before leaving.

In addition, if I wanted to start new file server from scratch, can I take and boot a new vm, attached the vmdk files to the new VM as datastores, and boot those into windows and receive all the permissions and drive space without having to perform a restore of any kind? (I kind of suspect windows will want to format those during diskmgmt operations but I'm not sure).

Any suggestions on where to move next?
0
Comment
Question by:PriorityResearch
  • 3
  • 3
7 Comments
 
LVL 3

Accepted Solution

by:
WiReDWolf earned 500 total points
Comment Utility
I've seen this behaviour with volume shadow copy services hanging trying to take a snapshot.  Do you take snapshots of your data during the day?
0
 

Author Comment

by:PriorityResearch
Comment Utility
I believe Appassure uses VSS to take snapshots every hour.  Our Appassure resides on a separate host with separate diskIO, but still resides within the vCenter environment.

Appassure has been present since we installed the VSA environment.  

What tends to cause the hang?

Any ideas how to turn on logging to see if that the issue, or take preventive steps to stop it from rearing its ugly head again?

Edit: Does Previous Versions also use VSS?
0
 
LVL 3

Expert Comment

by:WiReDWolf
Comment Utility
Previous Version does also use VSS and I've found that VSS doesn't play nice with multiple partners.  

I have a terminal server that exhibits identical behaviour to yours.  Every so often it will develop a resource leak and eventually choke itself out to the point the server is not functional.  Because the leak is gradual the server never logs anything until it gets to the point that it can't log anything.  A reboot solves the problem until it happens again which can be anywhere from a day to a couple of months.

It's annoying for us to have this server drop off every once in a while but it doesn't sound like it affects you as much as it does us.  

A suggestion would be to install some monitors to keep an eye on your resources.  If you develop a leak it would be good to be able to stop it before it takes the server down.  As an MSP I have my own tools to use but I'm pretty sure you can configure the Windows monitors.  If not I'll help you find some third party software.

Based on your detailed description of the issue I think the problem and solution lies with this one VM and it has nothing to do with being a VM.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:PriorityResearch
Comment Utility
I'd whole heatedly agree with you on your last statement. I've requested solar winds in the past but have never been able to make the roi seem logical to the powers that be. What third party tools do you use or would you suggest? Windows performance monitors are local and I'd really prefer something that has central reporting as we'd monitor more than just that one machine.
0
 
LVL 30

Expert Comment

by:pgm554
Comment Utility
Agree, it sounds like a classic memory leak.

See:

http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
0
 

Author Comment

by:PriorityResearch
Comment Utility
Thank you for the support so far.

I've checked in on a few things and noticed that for some reason my Previous Versions had to triggers.

The first which was intentional was 7am to 7pm every hour daily.

The second was daily at noon.

Perhaps the combination of the two triggers was causing a hang occasionally?  If that does not resolve the issue, I will try to disable Previous Versions entirely (I don't want to because the speed of recovery for most files deleted or accidentally saved etc is much faster than mounting a restorepoint and searching them in Appassure). Going to mark this as solved.
0
 
LVL 3

Expert Comment

by:WiReDWolf
Comment Utility
Thanks.

I would try disabling Previous Version for a couple of weeks.  If the problem seems resolved then you probably found the cause.

I've used the Zenith platform and found the VSS support in the backup software was what caused the problem far more often.  Zenith backup back-end is StorageCraft.  Once I disabled the VSS support from the backup software I think I've only had a couple outages due to resource leaks.

Almost anything can set up with SNMP traps that can be centrally managed for monitoring.  I've found it to be a pain to set up but there are plenty of third party tools.  Solar Winds is one of them but I'm sure there's more.  Again, I have much of this built into my MSP software so I haven't really spent a lot of time looking.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

I had a question today where the user wanted to know how to delete an SSL Certificate, so I thought that I would quickly add this How to! Article for your reference. WHY WOULD YOU WANT TO DELETE A CERTIFICATE? 1. If an incorrect certificate was …
Restoring deleted objects in Active Directory has been a standard feature in Active Directory for many years, yet some admins may not know what is available.
This tutorial will walk an individual through the steps necessary to install and configure the Windows Server Backup Utility. Directly connect an external storage device such as a USB drive, or CD\DVD burner: If the device is a USB drive, ensure i…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now