Solved

NFS Storage lost Connection to vmware server

Posted on 2011-02-15
43
3,566 Views
Last Modified: 2012-05-11
I recently mounted an NFS storage onto a vmware host for it to be the storage for one of our virtual machines. I am using a ReadyNAS 1100 from Netgear.  I have the virtual server up and running. My goal is to migrate data from another server onto this VM but I noticed that I am getting some events that have me concerned.

Lost connection to server 192.168.130.112 mount
point /ESX mounted as b175e60c-45a2f524-0000-
000000000000 (1100-NFS Storage).
error
2/15/2011 7:26:05 AM

Restored connection to server 192.168.130.112
mount point /ESX mounted as
b175e60c-45a2f524-0000-000000000000 (1100-
NFS Storage).
info
2/15/2011 7:26:17 AM

It continues all day but I am able to open a console.
My worry is that when the time comes to migrate the data over that it might cause some issues.

I went onto vmare knowledge base and I am currently following the resolutions:

1. Verify ESX VMkernal IP has been has been granted permissions to NFS Storage
2. Verify that any firewalls between ESX host and NFS device have ports 111 and 2049 open
3. Verify NFS traffic has been enabled on ESX firewall
4.Verify that ESX host can vmkping NFS server
5.Verify that NFS host can ping ESX host
6. Verify virtual switch being used for storage has been configured correctly
7. Verify storage array is listed on hardware compatibility

All steps come back good but it is not on the comapability list.
0
Comment
Question by:MECIT
  • 24
  • 16
  • 3
43 Comments
 
LVL 8

Expert Comment

by:markzz
ID: 34897256
There are many aspects you will need to verify.
The first and primary issue of course is the NAS is not on the HCL.
None the less this may mean it's not been tested by VMWare (or worse)
I would suggest the first thing to look at is your network.
Ideally NFS traffic should not be routed or firewalled. Therefore ensure your ESX servers and NAS are in the same network segment. Ideally they should also share common switching, and therefore ne directly connected via a common switch or switches.
Check all network interfaces are set to a given speed, ideally 1000/Full, this will take out of the equasion ports renegotiating. Your ESX Servers should have dedicated interfaces for storage IP traffic. (NFS)
If you can verify the above are OK it's time to look at the NAS.
Please let us know the outcome and we can start to discuss the NAS device.
0
 
LVL 8

Expert Comment

by:markzz
ID: 34897286
OH another thought on the network side.
If you have dedicated switching, the network path is know to the absolute degree and your switches support Jumbo frames it may be benificial to enable it. (dedicated NIC's, Switches and NAS only)
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34897302
Checked all the networking pieces (cables, switches, etc.) to make sure there are no loose connections. Check for speed/duplex mismatches as these ccan cause sporadic connectivity.

Finally check you have latest firmware revision on your ReadyNAS 1100.

Good Luck
0
 

Author Comment

by:MECIT
ID: 34897347
I looked at the 1100 to see if there were any errors and this is what showed

Auto-negotiation                           0
Disconnect                                                0
False carrier                                                2921
Idle errors                                               13369
Link failures                                               0
Receive errors                          0
Symbol errors                          280
VLAN tags                                        0
TCP Retransmits                           21263
Unrecovered TCP Retransmits      16280

0
 
LVL 8

Expert Comment

by:markzz
ID: 34897438
You have some errors which should be looked at. They appear to be dominantly transmitts..
You will need to go over all of the above mentioned items and as bgoering suggested Firmware.. Could I add check the firmware on your switches and of course the ESX Servers.
What version of ESX are you running?
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34897505
What kind of server do you have for esx? Is it on the HCL? How many processors and cores? how much RAM?

The high retransmits indicate that the ReadyNAS is not receiving an ACK for a transmission in a timely manner. The unrecovered number means it didn't receive the ACK at all. If the server is undersized for the workload that can be a contributing factor - as can the speed/duplex mentioned above..
0
 

Author Comment

by:MECIT
ID: 34897649
ESX 4.0

PowerEdge 2950
8 CPUs x2.659 GHz
Intel Xeon E5430 @2.66GHz
32GB RAM

Ready NAS 1100
Version 4.1.7 (latest firmware)
2TB of Storage
0
 

Author Comment

by:MECIT
ID: 34909638
How do I verify that my ESX servers have dedicated interfaces for NFS Traffic?

I opened the ports need for NFS traffic, should I close the ports since you mentioned it does not to be routed or firewalled.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34909694
Your storage should be on a separate network from other network activity like vmotion and virtual machine traffic. The seperate network should have dedicated physical nics on the ESX server.
0
 

Author Comment

by:MECIT
ID: 34909841
I have the ESX servers, NAS and SAN on 172 network.

NAS eth1 -- 1000 full     172 network
        eth2 -- 1000 full      192 network

Esx Server
vmnic0 -- 1000 full        172 network
(vm network and service console)  My virtual server that connects to NAS is on this nic.

vmnic1 -- 1000 full 192 network
(vmkernel)

vmnic2 -- 1000 full 192 network
(iSCSI service console 2 and iSCSI vmkernal 2)

this is how we have it setup.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34909879
Maybe I have misunderstood. My understanding was that ESX was mounting the NFS as a datastorem, and on the that datastore .vmdk files would be created for vms.

Is it that ESX is not mounting NFS directly, instead a vm running on ESX is mounting the NFS share?
0
 

Author Comment

by:MECIT
ID: 34910007
The SAN is the main storage for the ESX servers.They are mounting directly to the SAN.
The 1100 NAS was originally used for backup purposes until we purchased a 2100 NAS that took its place.

We wanted to continue to use the NAS so I researched and found that I could use it for virtual machines .

so to answer your question, yes a vm running on ESX is mounting the NFS share.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34910122
OK, I think we have been working on the wrong problem. What type of nic do you have configured in your virtual machine? Are VMware tools installed?

If no VMware tools please install
If NIC is not VMXNET3 please change (requiress the tools for driver support)
0
 

Author Comment

by:MECIT
ID: 34910217
The nic in the virtual machine is using E1000.
I do have the vmware tools installed.

Can I add the vmnext3 and then remove the other nic?
What exactly is the vmnext3 and why do I need to use it for the NAS?

0
 
LVL 28

Accepted Solution

by:
bgoering earned 500 total points
ID: 34910327
Yes, the only way to get a vmxnet3 nic is to select that type when you add it to the system. vmxnet3 is a highly paravirtualized nic that provides the best performance and throughput with the least amount of overhead for a virtual machine. The E1000 is more of an emulated nic (rather than paravirtualized) that consequently has a bit more overhead and a bit less performance.

I always recommend using the vmxnet3 nic in vmware
0
 

Author Comment

by:MECIT
ID: 34915837
I went ahead a added the vmxnet3 nic and removed the E1000.
Is there anything else I need to do?

By making this change, will this resolve the issue?
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34916676
Hopefully it will resolve the issue - try it and monitor for a bit see if the problem is gone
0
 

Author Comment

by:MECIT
ID: 34916836
Will do.
I'll keep you posted.
Thanks
0
 

Author Comment

by:MECIT
ID: 34917403
I just checked

Restored connection to server 192.168.130.112
mount point /ESX mounted as
b175e60c-45a2f524-0000-000000000000 (1100-
NFS Storage).
info
2/17/2011 9:23:37 AM

Lost connection to server 192.168.130.112 mount
point /ESX mounted as b175e60c-45a2f524-0000-
000000000000 (1100-NFS Storage).
error
2/17/2011 9:14:06 AM
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34917611
Please post a screenshot of your configuration for networking on esx. Also a diagram of the physical network identifying esx, the storage, and any switches/routers/firewalls in the path.
0
 

Author Comment

by:MECIT
ID: 34918019
Network Config
Working on an updated diagram. Will post later.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 

Author Comment

by:MECIT
ID: 34919320
Here is the Diagram.

 Diagram
0
 
LVL 28

Assisted Solution

by:bgoering
bgoering earned 500 total points
ID: 34919755
OK, you are trying to access Readynas over 192 network, same network you are already accessing iSCSI over apparently with no issues. Looking at format of message I am thinking now that you are mounting readynas nfs share as datastore on esx, and instead of the virtual machine mounting storage directly it just has a vmdk file on that datastore. On the physical side you have the switch (miscconfigured port maybe) for speed/duplex error for where readynas plugs in. And you have the cable. I rule out physical on ESX side based on no problems with iSCSI. You apparently using vSwitch2 and vmnic2 on ESX.

This is a process of elimination - but at this point I am beginning to believe device isn't on the HCL for a reason. Other things to try to troubleshoot.

1 different switch port for readynas - if this is a managed switch can you get into it and look at port errors and such? What kind of switch is it?

2 try different patch cable to hook up readynas

3 create a seperate vmkernel port with different IP address and present NFS there, could be conflicting with iSCSI traffic

If none of them I am pretty much out of ideas, could be bad swith I guess... retransmits very high which is pretty unusual for same subnet traffic
0
 

Author Comment

by:MECIT
ID: 34920485
Will test your list and post results tomorrow
0
 

Author Comment

by:MECIT
ID: 34929168
Havent been able to troubleshoot your list but i checked the network logs on the NAS

this is what it shows:

Network Errors  [Ethernet 2]

Auto-negotiation 0
Disconnect 0
False carrier 0
Idle errors 0
Link failures 0
Receive errors 0
Symbol errors 0
VLAN tags 0
TCP Retransmits 354
Unrecovered TCP Retransmits 265

The last time the NAS lost connection was at 11:27am restored at 11:27:44am.

Are the low numbers better or should they be at zero?
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34929730
low numbers are much better
0
 

Author Comment

by:MECIT
ID: 34930115
Would I need still troubleshoot or is this normal?

I will check through out the weekend to see if it loses connection
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34930879
I would keep an eye on it - an occassional retransmit is ok, but more commonly seen in a WAN environment than a LAN
0
 

Author Comment

by:MECIT
ID: 34942891
It loses and restores connection with in certain time frames

2/18/11 - 2:34am through 11:27am
               4:47pm through 10:25pm
2/19/11 - 3:45am through 6:25am
               2:25pm through 10:49pm
2/20/11 - 5:29am through 7:19am
               11:19am through 11:53pm
2/21/11
I switched out the cable for the ReadyNAS.
I plugged it into a new port on the switched.

The switch is a Dell PowerConnect 5424. As far as I can see, the port is not showing any errors but there are no logs. I have to setup a remote log server.

Will be working on creating seperate vmkernal port for  NFS.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34943536
I am not real familier with the dell switches (cisco guy here) but there should be interface statistics even without a log server. You may want to clear the error counters and observe them.
0
 

Author Comment

by:MECIT
ID: 34951833
This is what I am getting on the switch:

Received Unicast Packets                2614321
Transmit Unicast Packets                 596771
Received Nonunicast Packets          44
Transmit Nonunicast Packets            757


I created a vmkernal on our other ESX host.
I gave it  a 192.168.150.100 IP address.

Then I went to the Ready NAS and changed it to 192.168.150.112. As soon as I did this it went offline and the vm was not accessible.

I changed the IP on the NAS to its original and restarted the NAS. VM is back up.

My plan was to migate it over to the other host so that it would use that dedicated nic for the NFS vmkernal.

Should I have migrated it over to the other host first before changing the IP on th NAS? When it grays out should I unmount it and mount it again with the new IP address?

Would it be better to create the vmkernal on the current host?
0
 
LVL 28

Assisted Solution

by:bgoering
bgoering earned 500 total points
ID: 34952009
Yes, I believe you will need to unmount the nfs and then remount it when changing IP address.

Just looked at the cli reference guide for your switch (http://support.dell.com/support/edocs/network/pc80xx/en/cli_ref/PDF/cli_ref.pdf) and on page 244 it starts a description of the "show interfaces counters" command that should show all of the information per port - pay special attention to the counters for error conditions such as alignment, oversized packet, MAC Rx Errors, and collisions. For example:

show interfaces counters ethernet 1/g1

will show all of the counters for port 1/g1

Looks as if you need to issue the command in priveledged exec mode as

show interfaces counters ethernet 1/xg1

To get all of the information
0
 

Author Comment

by:MECIT
ID: 34963287
I changed the IP address on the NAS and unmounted it.

I received an error:

 Error when mounting
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34966535
Of you changed the IP address on the vmkernel you will likely need to go into the NAS and permit the new IP.
0
 

Author Comment

by:MECIT
ID: 34969478
I am having an issue with the NAS. I had to reboot the NAS and now I am not able to get into the NAS.
I am going to try to reset it to factory default and may be try setting it up again.
Will keep you post.
0
 

Author Comment

by:MECIT
ID: 34979626
I had to reset the NAS back to Factory Default. I have it up and running now.
I had created a backup of the ESX folder that was on the NAS , it contained the vm server files, beforeI wiped it out.
How would I get my server running again in vm?
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34979663
Restore your backup, then browse the datastore and right click the vmx file for each vm and select add to inventory
0
 

Author Comment

by:MECIT
ID: 34980729
uploading error
I can not power on the vm server.
0
 

Author Comment

by:MECIT
ID: 34980799
I transferred everything over except for a 750G vmdk file. When I try to remove it an error pops up

"A general system error occured: the system returned an error.  Communication with the virtual machine may have been interrupted"

Does this mean I have to move it over to power it on?

I wanted to try an recreate another virtual disk since I did not move it over to the NAS.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34983907
If a disk is missing from the configuration it will not allow you to power on the vm. You can either move it over so it finds it, or if it isn't needed you can go into edit settings and remove the virtual hard disk. Probably you will have to select the option not to delete from disk otherwise it will lood for it to delete it, and it isn't there.
0
 

Author Comment

by:MECIT
ID: 34983957
I get the same error on both options.
I guess I have to recreate it or move it over during the weekend.
0
 

Author Comment

by:MECIT
ID: 34997007
I copied over the 750 GB vmdk file over the weekend.
When I try to power on the vm server I receive the following error:

"Could not power on VM :Permission denied."
0
 

Author Comment

by:MECIT
ID: 35061358
I basically started from scratch. I feel that the ReadyNAS 1100 is not reliable for any production virtual machines.

We are going to order an MD1000 expansion for our MD3000i.

Thank you for your help.
0

Featured Post

Get up to 2TB FREE CLOUD per backup license!

An exclusive Black Friday offer just for Expert Exchange audience! Buy any of our top-rated backup solutions & get up to 2TB free cloud per system! Perform local & cloud backup in the same step, and restore instantly—anytime, anywhere. Grab this deal now before it disappears!

Join & Write a Comment

When we have a dead host and we lose all connections to the ESXi, and we need to find a way to move all VMs from that dead ESXi host.
HOW TO: Upload an ISO image to a VMware datastore for use with VMware vSphere Hypervisor 6.5 (ESXi 6.5) using the vSphere Host Client, and checking its MD5 checksum signature is correct.  It's a good idea to compare checksums, because many installat…
Teach the user how to use configure the vCenter Server storage filters Open vSphere Web Client:  Navigate to vCenter Server Advanced Settings: Add the four vCenter Server storage filters: Review the advanced settings: Modify the values of the four v…
This video shows you how to use a vSphere client to connect to your ESX host as the root user. Demonstrates the basic connection of bypassing certification set up. Demonstrates how to access the traditional view to begin managing your virtual mac…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now