Solved

Optimising NFS for network homes

Posted on 2010-11-08
34
1,918 Views
Last Modified: 2013-12-02
Could someone please provide me with some information on optimising NFS for home folders please?

I have 8 Apple Xserves running network home folders via NFS and we are experiencing some slow performance.

All Xserves are Dual Intels with between 3 and 6 GB of RAM all running Snow Leopard Server 10.6.4 We have set the threads for NFS on the servers to 40 and the clients to use 32.

I have read this topic, but the information seems to be 11 years old (http://www.experts-exchange.com/Software/Server_Software/File_Servers/nfs/Q_23471924.html?sfQueryTermInfo=1+10+30+nf+optim)

These servers serve upto 950 workstations.

Is there a set number of threads per connection / server or other best practice to follow ?

Thanks
0
Comment
Question by:gmbaxter
  • 13
  • 11
  • 4
  • +4
34 Comments
 
LVL 20

Accepted Solution

by:
woolnoir earned 250 total points
ID: 34083661
Have you identified the source of the slow performance ? checked the network saturation levels (look at switches), or the servers themselves (NIC , DISK, CPU). It might be something resource related or something optimisation can help with.
0
 
LVL 3

Expert Comment

by:sameer_dubey
ID: 34083739
How are the servers connected to the workstations: GbE or less? You might want to check if network itself is the bottleneck.

You could also try the following:

1. Use NFS over UDP only

This link might be useful: http://nfs.sourceforge.net/nfs-howto/ar01s05.html
0
 
LVL 11

Author Comment

by:gmbaxter
ID: 34083777
All servers are connected via gigabit to gigabit switches. Most clients connect to gigabit switches also.

CPU usage on servers is 10% max
Network usage is 20MB/s max
Disk activity is 24MB/s max

Typically the network is running 20-40% of available bandwidth
0
 
LVL 15

Expert Comment

by:JBond2010
ID: 34083823
You want to look into jumbo frames and also expand your nfs read/write data size. There's lots of guides to NFS performance tuning on the 'net, like this one that doesn't look too out of date.

Also realize that you're never going to be able to do better than disk speed, so make sure that's not a bottleneck.

enable jumbo frames, do this

ifconfig eth0 mtu 9000
Here are the mount options that I use when automounting home directories from our filer

rw,intr,soft,nfsvers=3.tcp,nolock,noatime,rsize=32768,wsize=32768
0
 
LVL 15

Expert Comment

by:JBond2010
ID: 34083836
Turning Off Autonegotiation of NICs and Hubs
Sometimes network cards will auto-negotiate badly with hubs and switches and this can have strange effects. Moreover, hubs may lose packets if they have different ports running at different speeds. Try playing around with the network speed and duplex settings

The NFS protocol uses fragmented UDP packets. The kernel has a limit of how many fragments of incomplete packets it can buffer before it starts throwing away packets. With 2.2 kernels that support the /proc filesystem, you can specify how many by editing the files /proc/sys/net/ipv4/ipfrag_high_thresh and /proc/sys/net/ipv4/ipfrag_low_thresh.

Once the number of unprocessed, fragmented packets reaches the number specified by ipfrag_high_thresh (in bytes), the kernel will simply start throwing away fragmented packets until the number of incomplete packets reaches the number specified by ipfrag_low_thresh. (With 2.2 kernels, the default is usually 256K). This will look like packet loss, and if the high threshold is reached your server performance drops a lot.

One way to monitor this is to look at the field IP: ReasmFails in the file /proc/net/snmp; if it goes up too quickly during heavy file activity, you may have problem. Good alternative values for ipfrag_high_thresh and ipfrag_low_thresh have not been reported.

0
 
LVL 11

Author Comment

by:gmbaxter
ID: 34083850
Hi, did you mean to link to an NFS guide in your last post?

I'll try jumbo frames.

where do you make these mount options? I've only used the GUI front end to NFS?

Also we are set to use NFS over UDP and TCP - i wondered whether to change this to TCP only as the majority of our network is one large class B network

@sameer dubey  -i'll check that link out thanks.
0
 
LVL 20

Expert Comment

by:woolnoir
ID: 34083949
The performance issues you are getting, can you be a little more specific, are they client side or server side ?
0
 
LVL 11

Author Comment

by:gmbaxter
ID: 34084030
clients are experiencing slow logins (after the authentication stage), slow document saving and opening, slowdown when browsing folders in their area and general slugishness.
0
 
LVL 20

Expert Comment

by:woolnoir
ID: 34084054
is this confined to a particular time or physical location. Any specific server which can be isolated ?
0
 
LVL 11

Author Comment

by:gmbaxter
ID: 34084257
No it is generally site-wide obviously the few clients on a 100Mb network will be slower, but it has been reported in most areas of the site which are all fed by different fibre runs and different switch rooms. Seems to be across all servers and occurs during peak times - 9am - 3.30 pm. So it would seem to be load related.

NFS server threads was set at 20 by default on all file servers. Doubling this to 40 did improve somewhat but not enough.
0
 
LVL 8

Expert Comment

by:et01267
ID: 34084621
Are you forced to use NFS? It is abysmally slow under the best of circumstances.  I was using NFS 20 years ago, and I think other protocols may have outpaced it just a little :)

However...

You probably looked at this thread http://hints.macworld.com/article.php?story=20030504042711312 but the last comment is interesting:  set the export on the server to "async" mode.  
0
 
LVL 20

Expert Comment

by:woolnoir
ID: 34084691
> We have set the threads for NFS on the servers to 40 and the clients to use 32.

is there any reason you have the clients set to 32 threads ? having the servers with a high number is useful and will aid performance but 32 threads on the client side seems a little high. If you have a lot of clients that will potentially be a LOT of connections.
0
 
LVL 11

Author Comment

by:gmbaxter
ID: 34085197
@  et01267

We have to use NFS, AFP is useless under 10.6 server - it crashes all the time and destroys file permissions. Under 10.5 server it ramps all cpu cores upto 100% and runs like a dog. Under 10.4 Server the access control list implementation is seriously flawed. SMB is not an option as it cannot automount for users home folders, this leaves us with NFS as the only solution.

@ woolnoir

A consultant advised that the client side setting may need changing, we are planning on going to TCP only and 99 server threads on the server.
0
 
LVL 8

Assisted Solution

by:et01267
et01267 earned 250 total points
ID: 34085805
Did you check that the export is using async ?  Not sure if this is an option for you, but I recall 20 years ago it made a big difference.

Yeah, I thought about the automount issue after I posted.  If you can't get your NFS to work fast, you might try a 3rd party SMB product, like Dave http://www.thursby.com/products/dave.html
0
 
LVL 21

Expert Comment

by:robocat
ID: 34086435

Are you sure the problem is situated in NFS and not in the disks being unable to cope ?

How many disks are in the raid configuration and what kind of disks (SAS, SATA ?). Software RAID or hardware RAID ?  How many IOPS are you getting on the disk system ?

0
 
LVL 11

Author Comment

by:gmbaxter
ID: 34087676
@ et01267

I can't see an option for async in the OS X gui - where should I be looking please?


@ robocat

The problem is definitely situated in NFS as (when) afp was working correctly (on the same hardware and setup) it ran smoothly. The issues were not speed or performance related.

Our 8 file servers use direct attached storage - compromising of non-raided sata disks.
four of the servers have two disks in dedicated to sharing - one sharepoint each, and the other two have single disks dedicated to sharing. The last server has a 16x1TB raid 5 array made up of 15 1 TB sata disks plus 1 hot spare. This is attached via two 4gbps fibre links to the server (active-passive failover to two separate controllers) Even this server experiences the sluggish performance when running network homes.

How can i measure iops please?
0
 
LVL 8

Expert Comment

by:et01267
ID: 34087733
It may be in the /etc/nfs.conf file

Check the man page or this:
http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man5/nfs.conf.5.html

You should look for nfs.client.allow_async and set it 1 -- there may not be a line in the file, in which case you'll need to add one.

YOu'll need to retart the NFS server, and you may want to check how your clients are mounting; the client also has to specify async
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 15

Expert Comment

by:roylong
ID: 34087830
If pretty much all your clients are accessing your servers via 1Gbps and your servers are connected over NFS via 1Gbps too, surely you've got some kind of contention over those links?  Do you have any large amount of graphics work going on via those links?
0
 
LVL 11

Author Comment

by:gmbaxter
ID: 34087901
@ et01267

Im unsure on enabling that async as it only mentions that writes speed up - the main delays are logging in and opening apps - mainly read operations? Plus the fact data integrity is at risk. It may be worth a tral however.

@ roylong

I would imagine so, however we restrict music and film production to working locally. Music production students save locally, and upload their smaller edited files at the end of a lesson. Film production students use an external drive as a scratch disk and also work locally. When they have a finished project, again they just upload this to their area - these students store their work on the server with the fibre-attached raid storage.

Some students produce graphics work, running from network homes but these files are usually small as they are mostly for print, and we limit the filesizes which the printers can accept.

Also caches, and microsoft user data is re-directed locally to the /tmp folder on the workstations.
0
 
LVL 8

Expert Comment

by:et01267
ID: 34087929
Yeah, if I recall, the async is what improved the NFS mount time significantly.  Try it and see if it helps.

The data integrity issues are somewhat overstated, I think.
0
 
LVL 11

Author Comment

by:gmbaxter
ID: 34094532
I've found the NFS setting on the server under /etc/nfs.conf and have successfully been making changes there.

On the clients however, there seems to be no corresponding conf file - how do i make conf changes on the clients please?
0
 
LVL 8

Expert Comment

by:et01267
ID: 34094743
How are configuring the NFS mounts on the clients?  I know there are some shareware utilities for MacOS that can do this -- for example I've used NFS Manager

The client settings exist the same /etc/nfs.conf file according to the man page.

The settings for async are nfs.client.allow_async and nfs.server.async

Then you need to set up your mounts so that they specify async.

There are a zillion tuning parameters -- maybe the NFS Manage software has the right mojo.
0
 
LVL 8

Expert Comment

by:et01267
ID: 34095029
Oh, one other thing:  your clients should probably be mounting with the "soft" option.  This also greatly speeds mount times, if I recall.

You can see the NFS Manager server options for async setting here:

Screen-shot-2010-11-09-at-12.10..png
0
 
LVL 15

Expert Comment

by:roylong
ID: 34095701
NFS Manager is a very good application for managing NFS on macintosh - I have used it for many years.
0
 
LVL 11

Author Comment

by:gmbaxter
ID: 34097198
Thanks i'll take a look at that tomorrow - how does it work - surely I dont have to set that up on each client - we have nearly 1000 macs. Or is that just a GUI to the /etc/nfs.conf on the servers?

That screenshot seems to relate to the nfs.conf man page i've been looking at.

There is no nfs.conf on the 10.6 clients we're using - I've been looking today and just cannot find it. Perhaps i have to create it?
0
 
LVL 8

Expert Comment

by:et01267
ID: 34097307
Well, that's why I asked "how do your mac clients get configured now?"  

As far as I know, NFS doesn't have any sort of discovery feature so the Mac automount stuff needs to be set up somehow.  I'm not really a XServe savant, so perhaps there is some management utility that can do it.

The NFS Manager help (which is *excellent*) says this:

"NFS Manager can be used to access the Open Directory data of a remote computer over the network. This makes it possible to configure the automount entries of a Mac OS X computer remotely."
0
 
LVL 11

Author Comment

by:gmbaxter
ID: 34097444
Ok, heres what happens now:

on fileservers, create shares and assign acls and posix permissions.
set which protocols can be used to access the share, in my case NFS, map root to nobody and set security.
bind fileservers into the directory (open directory master) - basically a pretty openldap server.
enable automount on the shares which contain users home folders - choose which directory to automount them to, and select which protocol, authenticate with directory administrator credentials - thats it no more automount options.
0
 
LVL 8

Expert Comment

by:et01267
ID: 34097618
then what you need to know are the default nfs options on the mac clients. hmm. If they are not ideal, then you need to change them (which you can do remotely via NFS Manager)

Also, I found this tidbit here

 Unreliable performance, slow data transfer, and/or high load when using NFS and gigabit
This is a result of the  default packetsize used by NFS, which causes  significant fragmentation on gigabit networks. You can modify this  behavior by the rsize and wsize mount parameters. Using  rsize=32768,wsize=32768 should suffice. Please note that this problem  does not occur on 100Mb networks, due to the lower packet transfer  speed.
Default value for NFS4 is 32768. Maximum is 65536. Increase from  default in increments of 1024 until maximum transfer rate is achieved.

0
 
LVL 8

Expert Comment

by:et01267
ID: 34097667
Oh, also check the nfs man page here

which says somewhere down in the middle:

 nfs.conf(5) can be used to configure some NFS client options.  In particular, nfs.client.mount.options
     can be used to specify default mount options.  This can be useful in situations where it is not easy to
     configure the command-line options.  Some NFS client options in nfs.conf(5) correspond to kernel con-figuration configuration
     figuration values which will get set by mount_nfs when performing a mount.  To update these values
     without performing a mount, use the command: mount_nfs configupdate.


0
 
LVL 11

Author Comment

by:gmbaxter
ID: 34100556
I imagine the client nfs configuration is all of the default values from the nfs.conf man page perhaps?

I have found some entries relating to nfs logging in kernel.log on my servers, here is some output from it (attached)

Main errors are nfs send error 32, 35 and 4

Any ideas on what these error codes mean ?


Nov  9 10:33:14 studentdata3-4 kernel[0]: nfsd nfsd send ersnrfsed senor ndnnd3 ef es5rfrod

Nov  9 10:33:14 studentdata3-4 kernel[0]: rr senr od3 er5rs

Nov  9 10:33:14 studentdata3-4 kernel[0]: or r d35

Nov  9 10:33:14 studentdata3-4 kernel[0]: 3 5s

Nov  9 10:33:14 studentdata3-4 kernel[0]: end error 35

Nov  9 10:33:14 studentdata3-4 kernel[0]: nfsd send error 35

Nov  9 10:33:14 studentdata3-4 kernel[0]: nfsd send error 32

Nov  9 10:33:14 studentdata3-4 kernel[0]: nfsdnf sdse nsd eernrodr  error 3232

Nov  9 10:33:14 studentdata3-4 kernel[0]: nfsd snfesd sndend ernrofr  sd32

Nov  9 10:33:14 studentdata3-4 kernel[0]: er rosr end 32e

Nov  9 10:33:14 studentdata3-4 kernel[0]: rror 32

Nov  9 10:51:09 studentdata3-4 kernel[0]: systemShutdown true

Nov  9 10:51:13: --- last message repeated 1 time ---

Nov  9 10:51:13 studentdata3-4 kernel[0]: Kext loading now disabled.

Nov  9 10:51:13 studentdata3-4 kernel[0]: Kext unloading now disabled.

Nov  9 10:51:13 studentdata3-4 kernel[0]: Kext autounloading now disabled.

Nov  9 10:51:13 studentdata3-4 kernel[0]: Kernel requests now disabled.

Nov  9 10:51:14 studentdata3-4 kernel[0]: nfsd send error 4

Open in new window

0
 
LVL 8

Expert Comment

by:et01267
ID: 34101818
No ideas on the error codes.  However, googling reveals that Mac NFS admins have seen this sort of log output, particularly under heavy load.

One of the things mentioned as a possible culprit for heavy loading was storing cache files (particularly browser cache files) on a remote filesystem vs local storage.

Not sure the best way around this, but possibly:
  1. Get users to turn off cache (good luck)
  2. Get users to relocate cache to a local directory, like /tmp (but the setting for this for Firefox on the Mac is in the user.js, which doesn't exist until you create it in the bowels of the preferences; not sure about other browsers)
  3. Don't mount the Home directory, but some other (sub-)directory, like Documents.
But perhaps tweaking the NFS config parameters may help.
0
 
LVL 11

Author Comment

by:gmbaxter
ID: 34119217
We redirect the library/caches folder and microsoft user data folders locally so this should reduce most of the caching from the servers.

nfs client conf file has had async set and threads set to 16
server nfs conf file has threads set to max of 131 (i cant allocate any more) and async enabled

Unfortunately we have to mount the entire home directory as our users are very mobile (its a college so users log into computers upto 7 times per day in different rooms).

With regard to packet size, we still do have some users on 100 mbit connections or wireless - about 25% i would estimate. Would setting the packet sizes cause a detrimental effect to these users?

Also where do i set the rsize and wsize parameter? There is no entry in the nfs.conf man page for those commands?
0
 
LVL 8

Expert Comment

by:et01267
ID: 34120258
Have you looked to see whether the mounts are happening over UDP or TCP?  TCP is preferred, and UDP may generate lots of fragments which will cause havoc (particularly with large frame sizes).  The article referenced below discusses diagnostics.

You have probably read this article (which is all good), but this section has some interesting bits on autonegotiation.  There are some other suggestions for things to look for in the nfsstat and netstat output. A lot may depend on whether you are getting a lot of collisions.

I'm not sure, but I think the rsize and wsize are set in the mount arguments on the client; these are the starting place for negotiation between client and server.  Some sources say to leave these alone, others say to make them big. I'm unsure whether you can set these on the server.

From the mount_nfs man page:

 rsize=<readsize>
             Set the read data size to the specified value.  The default is 8192 for UDP mounts and 32768
             for TCP mounts.  It should normally be a power of 2 greater than or equal to 1024.  Values
             greater than 4096 should be multiples of 4096.  It may need to be lowered for UDP mounts when
             the ``fragments dropped due to timeout'' value is getting large while actively using a mount
             point.  (Use netstat(1) with the -s option to see what the ``fragments dropped due to timeout''
             value is.)

     wsize=<writesize>
             Set the write data size to the specified value.  Ditto the comments w.r.t. the rsize option,
             but using the ``fragments dropped due to timeout'' value on the server instead of the client.
             Note that both the rsize and wsize options should only be used as a last ditch effort at
             improving performance when mounting servers that do not support TCP mounts.
0
 
LVL 11

Author Closing Comment

by:gmbaxter
ID: 34389907
Hi, thanks for the input - never managed to get this satisfactory, so have split points 5/50 to the most useful contributors.

Thanks.

btw, currently integrating into active directory for smb mounting - fingers crossed !
0

Featured Post

Get up to 2TB FREE CLOUD per backup license!

An exclusive Black Friday offer just for Expert Exchange audience! Buy any of our top-rated backup solutions & get up to 2TB free cloud per system! Perform local & cloud backup in the same step, and restore instantly—anytime, anywhere. Grab this deal now before it disappears!

Join & Write a Comment

Suggested Solutions

Hello, As I have seen there a lot of requests regarding monitoring and reporting for exchange 2007 / 2010 / 2013 I have decided to post some thoughts together and link to articles that have helped me. Of course a lot of information you can get…
Introduction This article explores the design of a cache system that can improve the performance of a web site or web application.  The assumption is that the web site has many more “read” operations than “write” operations (this is commonly the ca…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now