Netapp, RedHat, ReadyNAS and latency oh my!

Netapp 3020
ReadyNAS 1000s and 1100
Redhat Enterprise ver 3


The issue is extreme latency in copying data from an NFS netapp share to a NAS via a Redhat machine.

Background: two weeks ago we replaced a Netapp F740 with the 3020.  Config was mirrored over to the 3020 and the only issue we had that weekend was a Linux web server needed "ver2" added to fstab in the mount lines for Netapp NFS shares.

Currently the fstab files for all three Redhat machines is identical.  An example of such would be:

LABEL=/                 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
none                    /dev/pts                devpts  gid=5,mode=620  0 0
none                    /proc                   proc    defaults        0 0
none                    /dev/shm                tmpfs   defaults        0 0
/dev/sda3               swap                    swap    defaults        0 0
/dev/cdrom              /mnt/cdrom              udf,iso9660 noauto,owner,kudzu,$
netapp:/vol/vol0/custom /custom nfs vers=2,rw,hard,intr,bg 0 0
netapp:/vol/vol0/cnc /cnc nfs vers=2,hard,intr,rw,bg 0 0
netapp:/vol/vol0/pd /pd nfs vers=2,hard,intr,rw,bg 0 0
netapp:/vol/vol0/eweb /eweb nfs vers=2,rw,hard,intr,bg 0 0
netapp:/vol/vol0/web /custom/net/web nfs vers=2,rw,hard,intr,bg 0 0
netapp:/vol/vol0/pd /pd nfs vers=2,rw,hard,intr,bg 0 0
netapp:/vol/vol0/home /usr/people nfs vers=2,rw,hard,intr,bg 0 0
netapp:/vol/vol0/nov1 /novell nfs vers=2,hard,intr,rw,bg 0 0
netapp:/vol/vol0/eng      /eng      nfs    rw,bg,intr
10.2.1.12:/Archive4 /data/archive/archive4 nfs
10.2.2.10:/archive /data/archive/archive6 nfs rw 0 0
10.2.1.12:/RecentData /custom/archive nfs    rw,bg,intr

Open in new window


The 10.2.112 and 10.2.1.10 are two different readyNAS units.  All shares mount successfully, however a 'cp' from custom/archive to RecentData takes literally 5 times longer than it did with the old netapp.  Which spec-wise alone makes zero sense.

I have tried finding out what version of NFS the ReadyNAS units support but have been unsuccessful so far.  I've also no found out what versions RH Ent 3 supports.  I've thought about looking for an upgrade to the nic driver in use on RH.  But not being completely adept at Redhat I am very unsure of how to proceed.

CIFS shares work beautifully, copying from Netapp to NAS and vice/versa. It's NFS that's giving trouble.  Browsing, 'ls'ing, mkdir's, and cp'ing single files or small folders is fine.  But the normal procedure that the latency is an issue is dealing with multiple gig's of data, anywhere from 2-20gb at a time.

I have probably not provided enough information here but can anyone help steer me in the right direction?

Thanks!
LVL 14
Ben HartAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

arnoldCommented:
Check the network interface configuration autonegotiate versus fixed.  check the port to make sure the settings on the network interface match the switch config i.e. both fixed or both autoneg. Make sure there are no CRC errors on the switch which could mean that there is a mismatch in the configuration.

netapp to redhat to readynas
does redhat have a single or multiple interfaces?

The redhat is working as a buffer for the data being transferred.
0
Duncan RoeSoftware DeveloperCommented:
I would run tcpdump and see what is happening. Expect to see retries.
(I once fixed a problem with NFS mounts in VMware that way - the emulated NIC could not handle the 8KB UDP chunks used by NFS and limiting to 1KB (in fstab) restored normal operation)
0
Ben HartAuthor Commented:
Thanks guys, first off:
I looked on both the Netapp and the ReadyNAS and if I want to manually set the speed and duplex my only options are in the 100mb range.  It seems if I want gigabit I am forced to let it auto negotiate.  The switch however is a Cisco 3650g so I did statically set the port to 1gig and Full.  The nas, netapp and RH boxes all are reporting 1gig and Full if it makes any difference.  The switch showed zero CRC errors on that port, as well as the port errors on the NAS showed zero on all counts.

tcpdump scrolled too fast for me so I'm going to try piping it to a txt file then start a file copy and see if anything jumps out.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Ben HartAuthor Commented:
Eureka!  tcpdump while trying the normal copy process on RH from netapp to nas resulted in a literal ton of fragmented datagrams..

The MTU on all hosts involved is 1500 even, now I'm confused.
0
arnoldCommented:
Look at the NFS windowing. Rsize, wsize
Datagrams? I thought you have nfsv2 fragmented are coming from the netapp?
 tcp versus udp.
0
Ben HartAuthor Commented:
09:32:01.909481 rh2.unifiedbrands.net.2097126045 > Archive4.difc.root01.org.nfs: 1416 write [|nfs] (frag 31340:1424@0+)
09:32:01.909485 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31340:1424@1424+)
09:32:01.909487 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31340:1424@2848+)
09:32:01.909489 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31340:1424@4272+)
09:32:01.909490 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31340:1424@5696+)
09:32:01.909492 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31340:1240@7120)
09:32:01.909518 rh2.unifiedbrands.net.2113903261 > Archive4.difc.root01.org.nfs: 1416 write [|nfs] (frag 31341:1424@0+)
09:32:01.909519 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31341:1424@1424+)
09:32:01.909521 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31341:1424@2848+)
09:32:01.909523 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31341:1424@4272+)
09:32:01.909524 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31341:1424@5696+)
09:32:01.909526 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31341:1240@7120)
09:32:01.909550 rh2.unifiedbrands.net.2130680477 > Archive4.difc.root01.org.nfs: 1416 write [|nfs] (frag 31342:1424@0+)
09:32:01.909552 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31342:1424@1424+)
09:32:01.909554 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31342:1424@2848+)
09:32:01.909555 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31342:1424@4272+)
09:32:01.909557 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31342:1424@5696+)
09:32:01.909559 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31342:1240@7120)
09:32:01.948161 rh2.unifiedbrands.net.2147457693 > Archive4.difc.root01.org.nfs: 1416 write [|nfs] (frag 31343:1424@0+)
09:32:01.948167 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31343:1424@1424+)
09:32:01.948169 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31343:1424@2848+)
09:32:01.948171 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31343:1424@4272+)
09:32:01.948173 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31343:1424@5696+)
09:32:01.948175 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31343:1240@7120)

Open in new window


This is what Im seeing.. from redhat box to Nas.  how do I check the rsize and wsize and on what device?
0
Ben HartAuthor Commented:
MTU sizes were still all 1500 on rh2, archive4 and the netapp.  Right now I have them all connected to the same switch. Had to bounce the readynas so Im waiting for it to come back up then I'll test again.  Shouldnt make any difference I know but...
0
arnoldCommented:
You're using udp, fragments in this case means the fragment of a file rather than the packet was fragmented (part of the header has fragmented set to true.)
http://nfs.sourceforge.net/nfs-howto/ar01s05.html
http://web.mit.edu/rhel-doc/5/RHEL-5-manual/Deployment_Guide-en-US/s1-nfs-client-config-options.html
0
Ben HartAuthor Commented:
Thanks for the link.. should I be concerned about:
tracepath archive4
 1:  rh2 (10.2.1.41)                      asymm 65   0.017ms pmtu 552

Open in new window

0
Ben HartAuthor Commented:
Actually.. Im getting the 552 pmtu on any host I specify.  Surely that's not normal.
0
Ben HartAuthor Commented:
Ok so I setup the required NFS mounts on a fresh Ubuntu 11.10 install, ran the exact same cp string as earlier and it completed very quickly.  I tried a tcpdump like before as well but didn't even see my Ubuntu host mentioned, possibly a config difference with that or IDK.  Either way the plan going forward is to blow away friggin old RH and replace it with Fedora 16 just to process this Archiving sequence.

Disappointing the actual issue wasn't discovered but engineering is pushing hard to get some sort of resolution asap.
0
arnoldCommented:
Centos 5 or 6 is an option as well.
Does the current redhat 3 have a gigE network interface?
0
Duncan RoeSoftware DeveloperCommented:
The tcpdump output is OK. You can expect UDP fragments - when I had a problem there were extra lines of output indicating unfinished I think (it was a long time ago).
On the new system, reverse host name look up may not have been working for your Ubuntu host but you should have seen its IP address instead. Otherwise, which addresses are you seeing?
MTU of 1500 is standard - everyone uses it.
0
Ben HartAuthor Commented:
I didn't notice the IP of Ubuntu either, also there was ALOT less entries in this dump than from the RH boxes.  But I figured that was because the Ubuntu laptop was new and RH2 has been around for ever and would've been in the arp lists for every switch and cached in alot of servers.

The RH box had a 100mb nic, which once it's rebuilt on Fedora I'll pull that card and let it use the gigabit on-board adapter.
0
arnoldCommented:
The connection from the new ubuntu may have been nfsv3 rather than nfsv2 which you said was the option available.

I'd stick with the server thread using RH 5,6  or Centos 5/6 rather than the desktop version of fedora.
0
Ben HartAuthor Commented:
It might have.. I did remove the 'vers=2' from the three fstab lines I added to the Ubuntu machine before mounting them.  RH3 probably doesn't support NFS3?

The dev who setup the crontab for the entire archiving process, which copies data then modifies an Informix database will own this new box so I told him to use whatever OS he felt comfortable with.  If I was me I'd be sticking with Ubuntu but Im a noob so..
0
Ben HartAuthor Commented:
Any other opinions?  Should I had been worried about the very small PMTU?

The Redhat boxes did not have gig interfaces because it's rh3.. apparently  I was told ver3 does'n't support gigabit.  But I also discovered that the drives in the NAS's are WD Greens so it seems there's at least a handful of things that are all possibly contributing to the overall slow pace of data transfers.
0
Ben HartAuthor Commented:
Ok well there are no other opinions I take it so I'm going to answer this by saying that the slowness must be because of the drive interface on the Netgear nas devices coupled with the 100mb interface on that RH box.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Ben HartAuthor Commented:
not the answer I was looking for, but it's all I can get apparently.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Storage

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.