Solved

Netapp, RedHat, ReadyNAS and latency oh my!

Posted on 2012-03-20
19
860 Views
Last Modified: 2012-04-30
Netapp 3020
ReadyNAS 1000s and 1100
Redhat Enterprise ver 3


The issue is extreme latency in copying data from an NFS netapp share to a NAS via a Redhat machine.

Background: two weeks ago we replaced a Netapp F740 with the 3020.  Config was mirrored over to the 3020 and the only issue we had that weekend was a Linux web server needed "ver2" added to fstab in the mount lines for Netapp NFS shares.

Currently the fstab files for all three Redhat machines is identical.  An example of such would be:

LABEL=/                 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
none                    /dev/pts                devpts  gid=5,mode=620  0 0
none                    /proc                   proc    defaults        0 0
none                    /dev/shm                tmpfs   defaults        0 0
/dev/sda3               swap                    swap    defaults        0 0
/dev/cdrom              /mnt/cdrom              udf,iso9660 noauto,owner,kudzu,$
netapp:/vol/vol0/custom /custom nfs vers=2,rw,hard,intr,bg 0 0
netapp:/vol/vol0/cnc /cnc nfs vers=2,hard,intr,rw,bg 0 0
netapp:/vol/vol0/pd /pd nfs vers=2,hard,intr,rw,bg 0 0
netapp:/vol/vol0/eweb /eweb nfs vers=2,rw,hard,intr,bg 0 0
netapp:/vol/vol0/web /custom/net/web nfs vers=2,rw,hard,intr,bg 0 0
netapp:/vol/vol0/pd /pd nfs vers=2,rw,hard,intr,bg 0 0
netapp:/vol/vol0/home /usr/people nfs vers=2,rw,hard,intr,bg 0 0
netapp:/vol/vol0/nov1 /novell nfs vers=2,hard,intr,rw,bg 0 0
netapp:/vol/vol0/eng      /eng      nfs    rw,bg,intr
10.2.1.12:/Archive4 /data/archive/archive4 nfs
10.2.2.10:/archive /data/archive/archive6 nfs rw 0 0
10.2.1.12:/RecentData /custom/archive nfs    rw,bg,intr

Open in new window


The 10.2.112 and 10.2.1.10 are two different readyNAS units.  All shares mount successfully, however a 'cp' from custom/archive to RecentData takes literally 5 times longer than it did with the old netapp.  Which spec-wise alone makes zero sense.

I have tried finding out what version of NFS the ReadyNAS units support but have been unsuccessful so far.  I've also no found out what versions RH Ent 3 supports.  I've thought about looking for an upgrade to the nic driver in use on RH.  But not being completely adept at Redhat I am very unsure of how to proceed.

CIFS shares work beautifully, copying from Netapp to NAS and vice/versa. It's NFS that's giving trouble.  Browsing, 'ls'ing, mkdir's, and cp'ing single files or small folders is fine.  But the normal procedure that the latency is an issue is dealing with multiple gig's of data, anywhere from 2-20gb at a time.

I have probably not provided enough information here but can anyone help steer me in the right direction?

Thanks!
0
Comment
Question by:Ben Hart
  • 12
  • 5
  • 2
19 Comments
 
LVL 76

Expert Comment

by:arnold
Comment Utility
Check the network interface configuration autonegotiate versus fixed.  check the port to make sure the settings on the network interface match the switch config i.e. both fixed or both autoneg. Make sure there are no CRC errors on the switch which could mean that there is a mismatch in the configuration.

netapp to redhat to readynas
does redhat have a single or multiple interfaces?

The redhat is working as a buffer for the data being transferred.
0
 
LVL 34

Expert Comment

by:Duncan Roe
Comment Utility
I would run tcpdump and see what is happening. Expect to see retries.
(I once fixed a problem with NFS mounts in VMware that way - the emulated NIC could not handle the 8KB UDP chunks used by NFS and limiting to 1KB (in fstab) restored normal operation)
0
 
LVL 14

Author Comment

by:Ben Hart
Comment Utility
Thanks guys, first off:
I looked on both the Netapp and the ReadyNAS and if I want to manually set the speed and duplex my only options are in the 100mb range.  It seems if I want gigabit I am forced to let it auto negotiate.  The switch however is a Cisco 3650g so I did statically set the port to 1gig and Full.  The nas, netapp and RH boxes all are reporting 1gig and Full if it makes any difference.  The switch showed zero CRC errors on that port, as well as the port errors on the NAS showed zero on all counts.

tcpdump scrolled too fast for me so I'm going to try piping it to a txt file then start a file copy and see if anything jumps out.
0
 
LVL 14

Author Comment

by:Ben Hart
Comment Utility
Eureka!  tcpdump while trying the normal copy process on RH from netapp to nas resulted in a literal ton of fragmented datagrams..

The MTU on all hosts involved is 1500 even, now I'm confused.
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
Look at the NFS windowing. Rsize, wsize
Datagrams? I thought you have nfsv2 fragmented are coming from the netapp?
 tcp versus udp.
0
 
LVL 14

Author Comment

by:Ben Hart
Comment Utility
09:32:01.909481 rh2.unifiedbrands.net.2097126045 > Archive4.difc.root01.org.nfs: 1416 write [|nfs] (frag 31340:1424@0+)
09:32:01.909485 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31340:1424@1424+)
09:32:01.909487 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31340:1424@2848+)
09:32:01.909489 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31340:1424@4272+)
09:32:01.909490 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31340:1424@5696+)
09:32:01.909492 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31340:1240@7120)
09:32:01.909518 rh2.unifiedbrands.net.2113903261 > Archive4.difc.root01.org.nfs: 1416 write [|nfs] (frag 31341:1424@0+)
09:32:01.909519 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31341:1424@1424+)
09:32:01.909521 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31341:1424@2848+)
09:32:01.909523 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31341:1424@4272+)
09:32:01.909524 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31341:1424@5696+)
09:32:01.909526 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31341:1240@7120)
09:32:01.909550 rh2.unifiedbrands.net.2130680477 > Archive4.difc.root01.org.nfs: 1416 write [|nfs] (frag 31342:1424@0+)
09:32:01.909552 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31342:1424@1424+)
09:32:01.909554 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31342:1424@2848+)
09:32:01.909555 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31342:1424@4272+)
09:32:01.909557 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31342:1424@5696+)
09:32:01.909559 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31342:1240@7120)
09:32:01.948161 rh2.unifiedbrands.net.2147457693 > Archive4.difc.root01.org.nfs: 1416 write [|nfs] (frag 31343:1424@0+)
09:32:01.948167 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31343:1424@1424+)
09:32:01.948169 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31343:1424@2848+)
09:32:01.948171 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31343:1424@4272+)
09:32:01.948173 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31343:1424@5696+)
09:32:01.948175 rh2.unifiedbrands.net > Archive4.difc.root01.org: udp (frag 31343:1240@7120)

Open in new window


This is what Im seeing.. from redhat box to Nas.  how do I check the rsize and wsize and on what device?
0
 
LVL 14

Author Comment

by:Ben Hart
Comment Utility
MTU sizes were still all 1500 on rh2, archive4 and the netapp.  Right now I have them all connected to the same switch. Had to bounce the readynas so Im waiting for it to come back up then I'll test again.  Shouldnt make any difference I know but...
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
You're using udp, fragments in this case means the fragment of a file rather than the packet was fragmented (part of the header has fragmented set to true.)
http://nfs.sourceforge.net/nfs-howto/ar01s05.html
http://web.mit.edu/rhel-doc/5/RHEL-5-manual/Deployment_Guide-en-US/s1-nfs-client-config-options.html
0
 
LVL 14

Author Comment

by:Ben Hart
Comment Utility
Thanks for the link.. should I be concerned about:
tracepath archive4
 1:  rh2 (10.2.1.41)                      asymm 65   0.017ms pmtu 552

Open in new window

0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 14

Author Comment

by:Ben Hart
Comment Utility
Actually.. Im getting the 552 pmtu on any host I specify.  Surely that's not normal.
0
 
LVL 14

Author Comment

by:Ben Hart
Comment Utility
Ok so I setup the required NFS mounts on a fresh Ubuntu 11.10 install, ran the exact same cp string as earlier and it completed very quickly.  I tried a tcpdump like before as well but didn't even see my Ubuntu host mentioned, possibly a config difference with that or IDK.  Either way the plan going forward is to blow away friggin old RH and replace it with Fedora 16 just to process this Archiving sequence.

Disappointing the actual issue wasn't discovered but engineering is pushing hard to get some sort of resolution asap.
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
Centos 5 or 6 is an option as well.
Does the current redhat 3 have a gigE network interface?
0
 
LVL 34

Expert Comment

by:Duncan Roe
Comment Utility
The tcpdump output is OK. You can expect UDP fragments - when I had a problem there were extra lines of output indicating unfinished I think (it was a long time ago).
On the new system, reverse host name look up may not have been working for your Ubuntu host but you should have seen its IP address instead. Otherwise, which addresses are you seeing?
MTU of 1500 is standard - everyone uses it.
0
 
LVL 14

Author Comment

by:Ben Hart
Comment Utility
I didn't notice the IP of Ubuntu either, also there was ALOT less entries in this dump than from the RH boxes.  But I figured that was because the Ubuntu laptop was new and RH2 has been around for ever and would've been in the arp lists for every switch and cached in alot of servers.

The RH box had a 100mb nic, which once it's rebuilt on Fedora I'll pull that card and let it use the gigabit on-board adapter.
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
The connection from the new ubuntu may have been nfsv3 rather than nfsv2 which you said was the option available.

I'd stick with the server thread using RH 5,6  or Centos 5/6 rather than the desktop version of fedora.
0
 
LVL 14

Author Comment

by:Ben Hart
Comment Utility
It might have.. I did remove the 'vers=2' from the three fstab lines I added to the Ubuntu machine before mounting them.  RH3 probably doesn't support NFS3?

The dev who setup the crontab for the entire archiving process, which copies data then modifies an Informix database will own this new box so I told him to use whatever OS he felt comfortable with.  If I was me I'd be sticking with Ubuntu but Im a noob so..
0
 
LVL 14

Author Comment

by:Ben Hart
Comment Utility
Any other opinions?  Should I had been worried about the very small PMTU?

The Redhat boxes did not have gig interfaces because it's rh3.. apparently  I was told ver3 does'n't support gigabit.  But I also discovered that the drives in the NAS's are WD Greens so it seems there's at least a handful of things that are all possibly contributing to the overall slow pace of data transfers.
0
 
LVL 14

Accepted Solution

by:
Ben Hart earned 0 total points
Comment Utility
Ok well there are no other opinions I take it so I'm going to answer this by saying that the slowness must be because of the drive interface on the Netgear nas devices coupled with the 100mb interface on that RH box.
0
 
LVL 14

Author Closing Comment

by:Ben Hart
Comment Utility
not the answer I was looking for, but it's all I can get apparently.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

Lets start to have a small explanation what is VAAI(vStorage API for Array Integration ) and what are the benefits using it. VAAI is an API framework in VMware that enable some Storage tasks. It first presented in ESXi 4.1, but only after 5.x sup…
Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now