Link to home
Start Free TrialLog in
Avatar of welly192
welly192

asked on

Windows 2003 server stops serving file shares- Cannot Remote desktop - Out of resources

Windows 2003 server file service problems. Large file transfers not possible after a few hours of
heavy use, Out of memory error on client's. Cannot remote desktop to machine. Must reboot machine.
2003 Server std, Only happens after heavy load. Serving up 4TB of data. Most data is housed on SAN.
No error's reported from server. Hardware is new. Network has over 1000 nodes on one flat switched  LAN. Anti virus has been removed, Only other 3rd party app is Backup Exec.

This follows a previous problem when the previous server (different hardware) just stopped
serving network shares. Reboot and it would serve files for a few monutes then stop. Packet traces
showed the server would not respond to clients handshake request. PSS contacted and could not
identify problem. Was traced to server service stopping. Now we have the above problem with a new server 4GB RAM. We have at peak ~400 open file sessions. Is 2003 server not up to the task??

Basic file I/O should not be this difficult yes? We had a Windows 2000 server box serve this same
data reliably for 5 years. We are running win 2003 R2 SP2.

Thanks for any insight

 
Avatar of Erik Pitti
Erik Pitti
Flag of United States of America image

This sounds like a memory leak.  A few questions:

Are you running Windows Server with the /3GB switch?  

Which Windows Edition, and processor architecture x86 or x64?

Have you tried running perfmon and watching the following counters?:

Memory/Free System Page Table Entries
Memory/System Driver Resident Bytes (would help find a memory leak in a driver)
Memory/Available MBytes


(These would be useful in troubleshooting a leak in the server service, but I doubt you'd find anything esp. after dealing with PSS.)
Server/Blocking Requests Rejected (used with the Work Item Shortages to track work Items)
Server/Errors System
Server/Files Open
Server/Sessions
Server/Work Item Shortages
Server/Pool Nonpaged Bytes
Server/Pool Nonpaged Failures
Server/Pool Paged Bytes
Server/Pool Paged Failures




Avatar of Netman66
It also sounds like either a bad HBA or bad drivers for it.

You can try enabling the Firewall Service on the server but disabling the Firewall itself.

Avatar of welly192
welly192

ASKER

Hi, Thanks for responding

Are you running Windows Server with the /3GB switch?  
No 3Gb switch.
Which Windows Edition, and processor architecture x86 or x64?
X86

The HBA drivers are current and we are showing no errors on SAN and
local I/O between LUNS is fine. We are also serving up 256 file shares.
and the current system load never passes 15-20%. Full backups ran over the weekend
without any issues.   The HBA is common between both servers though and just prior
to the original failing we did have a SAN switch failure hat this HBA was attached into.
SIngle path attached. I will investigate this in addition to the performance monitors.

Thanks!


try these steps also

1) Create/edit these values in the registry:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory
Management
"PoolUsageMaximum"=dword:00000030
"PagedPoolSize"=dword:ffffffff
2) Restart.
Here is the returned msg we get from clients. This only happens on
larger files.  There are plenty of references of this error on google
that appeared to be fixed with previous service packs.
Whenever we get this error we cannot rdp to the box.

Copying c:\NIGHTLY\Wed to e:\Wed at  7:27:25.10
File creation error - Not enough server storage is available to process this command.
On that local PC, does it have sufficient free drive space? It almost sounds like the server doesn't have enough storage space to cache all the data. Maybe the swap file isn't big enough? Sounds like it's related to drive storage on that PC somehow...
I refer back to the following perfmon counters
(These would be useful in troubleshooting a leak in the server service, but I doubt you'd find anything esp. after dealing with PSS.)
Server/Blocking Requests Rejected (used with the Work Item Shortages to track work Items)
Server/Errors System
Server/Files Open
Server/Sessions
Server/Work Item Shortages
Specifically this item:

Work Item Shortages
      
Shows the number of times that no work item was available or could be allocated to service the incoming request. A work item is the location where the server stores an SMB. The amount available fluctuate between a minimum and maximum value configured based on how the server is configured and the amount of memory on the computer. If work item shortages are occurring, it may be caused by an overloaded server. If the Work Item Shortages counter value is increasing, consider changing the registry value. HKEY_LOCAL_MACHINE\ SYSTEM\CurrentControlSet\ Services\LanmanServer\Parameters \Maxworkitems. Allowing this value to be achieved consistently initiates flow control, which hurts performance. This value is always 0 in the Blocking Queue instance.

http://www.microsoft.com/technet/prodtechnol/windows2000serv/reskit/counters/counters2_tdaf.mspx?mfr=true
Thanks again for your response, You have provided valuable insight. We are monitoring the
perfmon counters but have not had the problem reporoduce yet. We have checked the HBA's
and ran tests against them sent to the vendor and they ceritify the drivers are the latest and
fully operational with no problems. I will report back when we know more, this is a bizzarre
problem.


Truly bizarre, let us know what you find.
Problem just occured again. We have been running ok since Monday PM.
We cannot locate anything unusual in the log.

I know this is a lot to ask, but if you could take a look and see if anything looks
suspicious? http://www.sdfishing.com/bite/serverperfmon.zip 
Specifially 0 on the work item shortages. Currently we cannot rdp into the
box and large file xfers fail. THis behavior may disappear and shortly we may be
able to access via rdp again and the problem dissappears. This is worth way more than
500 points!  
reviewing now
Not seeing anything out of the ordinary in the log.  It all looks okay.

Are there any SVCHOST.EXE related errors in the System or Application Log?
Have you run Windows Update recently?  If not, could you?  There's certain circumstances where the automatic updates or windows installer service dies while installing a hotfix and crashes the SVCHOST.EXE process which brings down the RDP services (Remote Desktop and Remote Assistance) as well as any number of other services that run in the SVCHOST process (like Computer Browser or Server).

You could also try running Microsoft's Server Performance Advisor:
http://www.microsoft.com/downloads/details.aspx?FamilyID=09115420-8c9d-46b9-a9a5-9bffcd237da2&DisplayLang=en
No svchost.exe errors.
Windows Update probably has not been run since it got all the updates 2-3 weeks ago when the server was set up. RDP services are still running and are responsive at times, but not responsive during others. We also see other issues besides the RDP denial during these “down times” like the server not having enough resources to load the local admin’s profile when logging in at the console, etc.
ASKER CERTIFIED SOLUTION
Avatar of Erik Pitti
Erik Pitti
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
This makes total sense to me, but is hard to get objective data from the network
They take 5 minute mrtgreadings but we need peak and variance.  We have a flat
switched Lan and packet captures show all kinds of different protocols and
broadcasts. We ran ethereal on both the client and the server at the time we saw nothing
from the client but we saw some packets as being reassembled on the server side
and the packet size was less than the MTU. SO that was curious.
You could use perfmon for monitoring utilization and error rate on the network interface, although you cannot get the exact detail like you could with ethereal/wireshark.
Although the problem still exists I wanted to insure you got credit for your excellent advice.
The performance log was inconclusive, but it is a valuable tool. We are still having the issue
and have decided to bail on this server and have ordered the MS software storage server
NAS appliance that is supposesd to be optimized for file I/O and services.

Thanks again
Ray
Thanks, Ray!