Solved

File Server Performance Issues

Posted on 2013-01-24
11
494 Views
Last Modified: 2013-04-24
Greetings,

I have a 64-bit Windows Server 2008 Standard file server that is giving me some grief.  Basically, after a variable amount of time (days to weeks), my users will start noticing that they cannot save files or log into our Terminal Servers.  The common denominator is that the file server hosts the roaming profiles and the HOME and shared directories.  In total, the server has just over 5.5 million files.  It is virtual (VMware).  We have about 450 users, with 300-400 being concurent at peak times.  The roaming profiles are only used for Terminal Server session (Server 2003).  upon reboot, everything begins to function normally.  The only addition programs on the server are Trend OfficeScan (antivirus) and Diskeeper Undelete (salvage program).  Both have been in place for several months.  This problem has only been recent (following a physical move of our server environment in November 2012).  However, there doesn't seem to be an obvious changes or errors.  There are no Event Viewer errors to shed any light.  I've heard (and read) that 4 million files can be a threshold for certain server performance issues, especially backups.  However, it is unlikely that we have had any significant changes in file count over the last 9 months.  The server is allocated 8 CPU and 12GB RAM.  I have 5 allocated drives (although one can be deleted at any time).

I know that the sysmptoms are vague, so I'm just trying to get some fresh brainstorming or previous experiences out there.  Is it a bad design to have everything in one server?  We have 3.6 million files in our HOME directories alone.  As the profiles are only hit during logon and logoff of the Terminal Servers, I wouldn't expect the impact to be significant.  The shared and HOME directories are accessed by all users all the time.  Is there some kind of formula I should be following regarding resources versus users versus files?  I've been trying to inquire into best practices, but it seems to be all over the place.  The NIC is never more than 50% utilized (during backups overnight) and CPU utilization is normally lenn than 10%, while memory utilization is 50-90%.  The memory utilization concerned me, but upon reading more into Server 2008, it appeared to be normal.

Anyhow, I understand that this is pretty vague, but I do appreciate any insight that anyone can provide.  Obviously, having to reboot this server during business is not acceptible.

Thanks,

Jeremy
0
Comment
Question by:Jer
  • 5
  • 4
  • 2
11 Comments
 
LVL 14

Assisted Solution

by:RickEpnet
RickEpnet earned 100 total points
Comment Utility
You said it is virtual. Have you added any new servers to the host here lately? How much memory and what is the CPU you have given this VM. What do the performance monitors say in VMware as to the CPU and Disk access?
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
8 vCPU seems excessive, you have not overcomitted CPUs?

very few servers require more than 2vCPU!

excessive vCPU can cause performance issues in the VM.

also, what is the underlying datastore, the VM is stored on?
0
 
LVL 3

Author Comment

by:Jer
Comment Utility
Rick - We've added 2-3 servers to our virtual environment.  Nothing significant in size.  CPU usage is pretty regular on a daily basis, with values ranging from 250-3000 MHz.  Memory usage is 4-25%.  Virtual Disk Rate is 0-60,000 KBps.  Virtual Disk Requests are 0-1000.  The Network rate is 0-800 Mbps.  Network Packets received are 0-22,500,000, transmitted are 0-8,000,000.

Han - It is quite possible that we've overcommitted vCPU.  We're working with a 3rd-party that supports our VMware environment, while we support the actual servers and the connectivity.  We just moved to virtual last year, so I still have plenty of noob moments with it.  As I couldn't get any clear answers on best practices at the time of migration from physical to virtual (this server was was a fresh build, not P2V), I maintained the specs of our physical environment.  Hence, the 2 sockets and 8 CPU.  As sockets only matter for select applications, I could easily reduce this server to 1 socket and 2-4 vCPU.  Do you know of formulas/guidlines to follow?  I'm not sure I understand your question about the datastore.  What is it that you want to know about the datastore?

Thanks for any input.

Jeremy
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
what disks have been configured for your ESXi server?

RAID 5, RAID 10, how many, type, SATA, SAS?

Yes, reduce to 1 cpu, check performance, increase if necessary.
0
 
LVL 14

Expert Comment

by:RickEpnet
Comment Utility
Is your storage local or an iSCSI or FC SAN? I agree 100% reduce the vCPUs.
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 3

Author Comment

by:Jer
Comment Utility
The hosts are (3) Dell PowerEdge R710 (8 core, 96GB RAM) servers attachech to a NetApp 3210 SAN with 2 SAS 24x45GB shelf, RAID DP.  This particular server interacts with 3 datastores (SYS_NoRep, Data_Rep, and Data_NoRep).  In general, the hosts 'seem' to be rather underutilized.
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
okay, well you should have enough IOPS on the disk although because the I/O is virtualised will not perform as well as your filer.

So, why are you not using your filer as a NAS with CIFS shares, why use a Windows Server which is virtual, with a FC or iSCSI LUN. Windows will perform poorly compared to your SAN.

I would recommend the migration of roaming profiles, home drives, group data from your VM server to NetApp CIFS shares. Benefit from SAN Snapshots, Previous versions for end users to perform restores, DeDupe on the volume to reduce space, you can get a Trend Micro plugin for the SAN to do anti virus on the NetApp.

Performance will be superior.

Best practice dump Windows in favour of your NAS with CIFs. We migrate clients file servers to NetApp SANs, at the same time using iSCSI for VMware and NFS for Unix and Linux Clusters, in the same box, yes CIFs, NFS and iSCSI.
0
 
LVL 3

Author Comment

by:Jer
Comment Utility
We're FC SAN.  While I can appreciate your position on the CIFS shares, the fact is that we are using a Windows file server at this point and I need to know if there is something configured wrong.  My understanding is that our current virtual Windows server should handle its current load without issue.  While I agree that your recommendation could certainly have positive results, it is not something that we are going to do without further understanding any other impact, such as nightly backups and such.  As our environment continues to evolve, there are certainly many aspects that we'll be updating to current best practices.  However, as mentioned, the big issue here is trying to address the current environment before the server stops responding again.  

As of right now, the next step in troubleshooting is moving the drive containing the HOME dirs over to a new Server 2008 R2 machine.  Unfortunately, this means addressing 450+ user profiles (including Terminal Server tab) without any definitive proof that it will address anything.  I'm really hoping to find something obviously wrong with the file server or virtual environment.  

We are looking to address the vCPU allocation.
0
 
LVL 117

Accepted Solution

by:
Andrew Hancock (VMware vExpert / EE MVE) earned 400 total points
Comment Utility
reduce cpu and use vmxnet3 network interface, microsoft has never got roaming profiles right, they still cause issue, at login and logoff.

you may want to consider folder redirection

reducing the size of profiles, of the use of Profile Unity by Liquid Labs
0
 
LVL 3

Author Comment

by:Jer
Comment Utility
It looks like one of the contributing factors may have been our Undelete program.  I'm working with Diskeeper to see if something is amiss.  I've had it disabled for 3+ weeks and the server has been stable.  Still monitoring.  Will look into other suggestions.
0
 
LVL 3

Author Comment

by:Jer
Comment Utility
Greetings.  The problem was isolated to the use of Undelete.  I do appreciate all teh other suggestions, as I'm always looking to identify best practices.  Thanks.
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Exchange server is not supported in any cloud-hosted platform (other than Azure with Azure Premium Storage).
Will try to explain how to use the VMware feature TAGs in the VMs and create Veeam Backup Jobs using TAGs. Since this article is too long, I will create second article for the Veeam tasks.
This tutorial will give a an overview on how to deploy remote agents in Backup Exec 2012 to new servers. Click on the Backup Exec button in the upper left corner. From here, are global settings for the application such as connecting to a remote Back…
This tutorial will walk an individual through the steps necessary to join and promote the first Windows Server 2012 domain controller into an Active Directory environment running on Windows Server 2008. Determine the location of the FSMO roles by lo…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now