Hi All,
This has been a long term problem that I've tried to solve with reference to some online documentation and trial and error.
Problem:
NFS performance between an Informix database server and an application server is seen to be a problem.
Symptom:
Some scripts run over night perform an existance test on an NFS mounted file before read/write (i.e. if [ -f $file ]). Despite the fact that the file is actually there, the response if $?=1.
Furthermore, performing nfsstat every half an hour with a reset daily, shows that the getattr value on the client side is not performing as well as expected.
Documentation I have referred to:
(1)
http://www.princeton.edu/~psg/unix/Solaris/troubleshoot/nfsstat.htmlIn particular:
getattr > 40%:
The client attribute cache can be increased by setting the actimeo mount option. Note that this is not appropriate where the attributes change frequently, such as on a mail spool. In these cases, mount the filesystems with the noac option.
(2)
http://www.hn.edu.cn/book/NetWork/NetworkingBookshelf_2ndEd/nfs/ch18_06.htmgetattr > 60%:
Check for possible non-default attribute cache values on NFS clients. A very high percentage of getattr requests may indicate that the attribute cache window has been reduced or set to zero with the actimeo or noac mount option. It can also indicate that the NFS filesystem implementation is doing a poor job of attribute caching.
(3)
http://docs.hp.com/en/B1031-90043/ch02s03.htmlSet actimeo=1 or actimeo=3 for a directory that is used and modified frequently by many NFS clients. This ensures that the file and directory attributes are kept reasonably up to date, even if they are changed frequently from various client locations.
------------
With the above mentioned, several actimeo options have been tried including actimeo=1 & actimeo=3. Both still produced getattr of greater than 85%.
The following are example nfsstat's:
==========================
==========
==========
=
# nfsstat -c
Client rpc:
Connection oriented:
calls badcalls badxids timeouts newcreds badverfs
3364645 79 79 0 0 0
timers cantconn nomem interrupts
0 0 0 79
Connectionless:
calls badcalls retrans badxids timeouts newcreds
0 0 0 0 0 0
badverfs timers nomem cantsend
0 0 0 0
Client nfs:
calls badcalls clgets cltoomany
3313421 79 3313490 0
Version 2: (0 calls)
null getattr setattr root lookup readlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir statfs
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
Version 3: (3311489 calls)
null getattr setattr lookup access readlink
0 0% 2898793 87% 1331 0% 112749 3% 29612 0% 4 0%
read write create mkdir symlink mknod
9521 0% 85283 2% 9729 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
2310 0% 0 0% 5165 0% 0 0% 123228 3% 13698 0%
fsstat fsinfo pathconf commit
3667 0% 0 0% 0 0% 16399 0%
Client nfs_acl:
Version 2: (0 calls)
null getacl setacl getattr access
0 0% 0 0% 0 0% 0 0% 0 0%
Version 3: (2002 calls)
null getacl setacl
0 0% 2002 100% 0 0%
==========================
==========
==========
=
And mount points:
# nfsstat -m
/calypso from databaseserver:/calypso
Flags: vers=3,proto=tcp,sec=sys,h
ard,intr,l
ink,symlin
k,acl,rsiz
e=32768,
wsize=32768,retrans=5,time
o=600
Attr cache: acregmin=3,acregmax=3,acdi
rmin=3,acd
irmax=3
/topcall from topcall:C:\TCLFI
Flags: vers=3,proto=tcp,sec=sys,h
ard,intr,l
ink,symlin
k,rsize=81
92,wsize
=8192,retrans=5,timeo=600
Attr cache: acregmin=3,acregmax=60,acd
irmin=30,a
cdirmax=60
/opt/informix from databaseserver:/opt/inform
ix
Flags: vers=3,proto=tcp,sec=sys,h
ard,intr,l
ink,symlin
k,acl,rsiz
e=32768,
wsize=32768,retrans=5,time
o=600
Attr cache: acregmin=3,acregmax=3,acdi
rmin=3,acd
irmax=3
/calypso/reports from databaseserver:/calypso/re
ports
Flags: vers=3,proto=tcp,sec=sys,h
ard,intr,l
ink,symlin
k,acl,rsiz
e=32768,
wsize=32768,retrans=5,time
o=600
Attr cache: acregmin=3,acregmax=3,acdi
rmin=3,acd
irmax=3
/calypso/archive from databaseserver:/calypso/ar
chive
Flags: vers=3,proto=tcp,sec=sys,h
ard,intr,l
ink,symlin
k,acl,rsiz
e=32768,
wsize=32768,retrans=5,time
o=600
Attr cache: acregmin=3,acregmax=3,acdi
rmin=3,acd
irmax=3
==========================
==========
==========
=
I suspect that given this will require outages to test (which are spaced apart by 2 weeks), and further trial and error, while it might not be a difficult question to answer the person answering might not have the complete picture. And futhermore, the person answering will have to be patient for me to allocate points.
For this reason, I have decided to allocate more points.
But over time, I might allocate extra points if not getting too many submissions.
So I'll start off at just above moderately difficult - 170 points.
Thanks in advance for any contributions.
Start Free Trial