Solved

Win 2008 R2 File Server Latency Issues

Posted on 2014-01-07
7
644 Views
Last Modified: 2014-01-08
First let me say this issue has yet to produce an event in event viewer.

We have a single Windows 2008 R2 File Server (VM in ESXi 5.1 in a VSA redudant environment) that has a problem.  It's only happened 7 times since last May 2013, but it's of a huge concern because it halts the entire company.  It also doesn't affect any other of the 8 VM we have in the VSA configuration or 7 VM in the vCenter (not in VSA).

The issue is that the file server slowly comes to a halt.  The latency issue starts off only affecting a few users and then escalates to the point where the server is non responsive at the console, but will service remote requests in 3-5 minutes.  The symptoms take anywhere from 3-5 hours to first rear their head to bringing the company to a grinding halt. (File Server is very important to us)

A reboot of the server immediately fixes the problem, however, we also have folder redirection turned on (stored on this server) for appdata roaming, desktop, and favorites.  Reboot of the file server also requires a reboot of all the users workstations, about 60 or so.

The file server has only File Services and FSRM installed on the device, but the problem was occuring before FSRM

Management wants an explanation and resolution and I basically have no idea where to start.  There's no logs, no events, and we simply do not currently have a third party monitoring tool that would record these happenings for review.  

VMWare ESXi reports no unusual service requests times in diskIO, network, CPU, or RAM usage of the machine during these times.  

Time of day has been anywhere during working hours, morning, afternoon, and right before leaving.

In addition, if I wanted to start new file server from scratch, can I take and boot a new vm, attached the vmdk files to the new VM as datastores, and boot those into windows and receive all the permissions and drive space without having to perform a restore of any kind? (I kind of suspect windows will want to format those during diskmgmt operations but I'm not sure).

Any suggestions on where to move next?
0
Comment
Question by:PriorityResearch
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
7 Comments
 
LVL 3

Accepted Solution

by:
WiReDWolf earned 500 total points
ID: 39763741
I've seen this behaviour with volume shadow copy services hanging trying to take a snapshot.  Do you take snapshots of your data during the day?
0
 

Author Comment

by:PriorityResearch
ID: 39763783
I believe Appassure uses VSS to take snapshots every hour.  Our Appassure resides on a separate host with separate diskIO, but still resides within the vCenter environment.

Appassure has been present since we installed the VSA environment.  

What tends to cause the hang?

Any ideas how to turn on logging to see if that the issue, or take preventive steps to stop it from rearing its ugly head again?

Edit: Does Previous Versions also use VSS?
0
 
LVL 3

Expert Comment

by:WiReDWolf
ID: 39763915
Previous Version does also use VSS and I've found that VSS doesn't play nice with multiple partners.  

I have a terminal server that exhibits identical behaviour to yours.  Every so often it will develop a resource leak and eventually choke itself out to the point the server is not functional.  Because the leak is gradual the server never logs anything until it gets to the point that it can't log anything.  A reboot solves the problem until it happens again which can be anywhere from a day to a couple of months.

It's annoying for us to have this server drop off every once in a while but it doesn't sound like it affects you as much as it does us.  

A suggestion would be to install some monitors to keep an eye on your resources.  If you develop a leak it would be good to be able to stop it before it takes the server down.  As an MSP I have my own tools to use but I'm pretty sure you can configure the Windows monitors.  If not I'll help you find some third party software.

Based on your detailed description of the issue I think the problem and solution lies with this one VM and it has nothing to do with being a VM.
0
Transaction Monitoring Vs. Real User Monitoring

Synthetic Transaction Monitoring Vs. Real User Monitoring: When To Use Each Approach? In this article, we will discuss two major monitoring approaches: Synthetic Transaction and Real User Monitoring.

 

Author Comment

by:PriorityResearch
ID: 39763960
I'd whole heatedly agree with you on your last statement. I've requested solar winds in the past but have never been able to make the roi seem logical to the powers that be. What third party tools do you use or would you suggest? Windows performance monitors are local and I'd really prefer something that has central reporting as we'd monitor more than just that one machine.
0
 
LVL 30

Expert Comment

by:pgm554
ID: 39764143
Agree, it sounds like a classic memory leak.

See:

http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
0
 

Author Comment

by:PriorityResearch
ID: 39765391
Thank you for the support so far.

I've checked in on a few things and noticed that for some reason my Previous Versions had to triggers.

The first which was intentional was 7am to 7pm every hour daily.

The second was daily at noon.

Perhaps the combination of the two triggers was causing a hang occasionally?  If that does not resolve the issue, I will try to disable Previous Versions entirely (I don't want to because the speed of recovery for most files deleted or accidentally saved etc is much faster than mounting a restorepoint and searching them in Appassure). Going to mark this as solved.
0
 
LVL 3

Expert Comment

by:WiReDWolf
ID: 39766353
Thanks.

I would try disabling Previous Version for a couple of weeks.  If the problem seems resolved then you probably found the cause.

I've used the Zenith platform and found the VSS support in the backup software was what caused the problem far more often.  Zenith backup back-end is StorageCraft.  Once I disabled the VSS support from the backup software I think I've only had a couple outages due to resource leaks.

Almost anything can set up with SNMP traps that can be centrally managed for monitoring.  I've found it to be a pain to set up but there are plenty of third party tools.  Solar Winds is one of them but I'm sure there's more.  Again, I have much of this built into my MSP software so I haven't really spent a lot of time looking.
0

Featured Post

Transaction Monitoring Vs. Real User Monitoring

Synthetic Transaction Monitoring Vs. Real User Monitoring: When To Use Each Approach? In this article, we will discuss two major monitoring approaches: Synthetic Transaction and Real User Monitoring.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I had a question today where the user wanted to know how to delete an SSL Certificate, so I thought that I would quickly add this How to! Article for your reference. WHY WOULD YOU WANT TO DELETE A CERTIFICATE? 1. If an incorrect certificate was …
After seeing many questions for JRNL_WRAP_ERROR for replication failure, I thought it would be useful to write this article.
This tutorial will give a short introduction and overview of Backup Exec 2012 and how to navigate and perform basic functions. Click on the Backup Exec button in the upper left corner. From here, are global settings for the application such as conne…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question