• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 804
  • Last Modified:

NFS Hang

I have built a new system and the load right now if very small, but I having some serious performance issues.

I have a single linux box hooked up to the open-e using NFS.  It is connected to 3 different shares.  Every so often (can't find a pattern), the NFS seems to lock-up/hang for about a minute, meaning any read requests for that mount are queued.  For example I am even unable to perform 'ls -la' on that directory until it come back again.

Nothing is reported in /var/log/messages on the linux box to suggest connection was lost.  Running a ping reports no errors,a ll come back 0.2ms or so.

I mount the shares in fstab using default values as follows:

10.20.20.160:/data1     /data/data1             nfs     defaults        0 0
10.20.20.160:/system    /data/system            nfs     defaults        0 0
10.20.20.160:/logs      /data/logs              nfs     defaults        0 0


These defaults give these settings:

[root@cnbflbs21 data1]# cat /proc/mounts | grep nfs
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
10.20.20.160:/data1 /data/data1 nfs rw,vers=3,rsize=524288,wsize=524288,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.20.20.160 0 0
10.20.20.160:/system /data/system nfs rw,vers=3,rsize=524288,wsize=524288,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.20.20.160 0 0
10.20.20.160:/logs /data/logs nfs rw,vers=3,rsize=524288,wsize=524288,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.20.20.160 0 0

When the "hang" is happening I can still access the actual files from a Windows box directly to the Open-E share, so I am pretty certain it's the NFS bit that is going wrong (at one end or another), not just a performance bottleneck.

Finally, the iostats on the open-e show very little activity, this box isn't busy

Other info:
open-e = 5.0.DB49000000.3278
linux = CentOS 5.4 x64 (it is running as a guest on ESX 4)
RAID = RAID6, 3ware 9650SE, 15 x 1TB drives + 1 hot spare
0
modell100699
Asked:
modell100699
  • 3
  • 2
  • 2
2 Solutions
 
arnoldCommented:
tail /var/log/messages see whether there are any NFS errors being reported.

The issue could be that the host or the guest is at different intervals is "overloaded".


Setup cacti and SNMP to monitor the various systems and see whether a pattern emerges over time.
0
 
Duncan RoeSoftware DeveloperCommented:
The rsize and wsize are set to half a megabyte - I used to have problems with the old default of 8KB (but on a system under VMWare, and nfs was using UDP).
I think your problem may be network retries. These are a low-level part of the TCP protocol and don't get logged anywhere usually. You could confirm that by running tcpdump.
Or as a quick test, try reducing the read & write sizes to say 8KB (shouldn't impact normal operation appreciably):

add options ",rsize=8192,wsize=8192" to the /etc/fstab entries for /data/data1 &c.

mount options are the 4th token on the line, commonly set to "defaults" but for nfs may be different
0
 
modell100699Author Commented:
I did say that nothing is reported in /var/log/messages, what would you suggest in terms of SNMP monitoring

duncan, the figures you suggest are a reduction, is that correct?  HAppy to try just not clear on why that would be a good thing?
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
arnoldCommented:
SNMP monitoring/polling will provide you some perspective on what was going on leading up-to the time when the issue first came up.
I.e. CPU/network activity spike etc. as well as a perspective on the server providing the NFS share.
0
 
Duncan RoeSoftware DeveloperCommented:
Yes I am suggesting a reduction in the length of file segments. I think 8KB should still give you excellent performance, but if there is something flaky about the network (could be cable, plug connection NIC speed or something else) then the likelihood of it causing problems decreases dramatically. Just try it.
0
 
modell100699Author Commented:
thanks, trying......
0
 
modell100699Author Commented:
Turns out that I had a problem with the load balancing mode on one of the NICs, this was causing packet loss.

Thanks
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

  • 3
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now