• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3874
  • Last Modified:

NFS Storage lost Connection to vmware server

I recently mounted an NFS storage onto a vmware host for it to be the storage for one of our virtual machines. I am using a ReadyNAS 1100 from Netgear.  I have the virtual server up and running. My goal is to migrate data from another server onto this VM but I noticed that I am getting some events that have me concerned.

Lost connection to server 192.168.130.112 mount
point /ESX mounted as b175e60c-45a2f524-0000-
000000000000 (1100-NFS Storage).
error
2/15/2011 7:26:05 AM

Restored connection to server 192.168.130.112
mount point /ESX mounted as
b175e60c-45a2f524-0000-000000000000 (1100-
NFS Storage).
info
2/15/2011 7:26:17 AM

It continues all day but I am able to open a console.
My worry is that when the time comes to migrate the data over that it might cause some issues.

I went onto vmare knowledge base and I am currently following the resolutions:

1. Verify ESX VMkernal IP has been has been granted permissions to NFS Storage
2. Verify that any firewalls between ESX host and NFS device have ports 111 and 2049 open
3. Verify NFS traffic has been enabled on ESX firewall
4.Verify that ESX host can vmkping NFS server
5.Verify that NFS host can ping ESX host
6. Verify virtual switch being used for storage has been configured correctly
7. Verify storage array is listed on hardware compatibility

All steps come back good but it is not on the comapability list.
0
MECIT
Asked:
MECIT
  • 24
  • 16
  • 3
3 Solutions
 
markzzCommented:
There are many aspects you will need to verify.
The first and primary issue of course is the NAS is not on the HCL.
None the less this may mean it's not been tested by VMWare (or worse)
I would suggest the first thing to look at is your network.
Ideally NFS traffic should not be routed or firewalled. Therefore ensure your ESX servers and NAS are in the same network segment. Ideally they should also share common switching, and therefore ne directly connected via a common switch or switches.
Check all network interfaces are set to a given speed, ideally 1000/Full, this will take out of the equasion ports renegotiating. Your ESX Servers should have dedicated interfaces for storage IP traffic. (NFS)
If you can verify the above are OK it's time to look at the NAS.
Please let us know the outcome and we can start to discuss the NAS device.
0
 
markzzCommented:
OH another thought on the network side.
If you have dedicated switching, the network path is know to the absolute degree and your switches support Jumbo frames it may be benificial to enable it. (dedicated NIC's, Switches and NAS only)
0
 
bgoeringCommented:
Checked all the networking pieces (cables, switches, etc.) to make sure there are no loose connections. Check for speed/duplex mismatches as these ccan cause sporadic connectivity.

Finally check you have latest firmware revision on your ReadyNAS 1100.

Good Luck
0
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

 
MECITAuthor Commented:
I looked at the 1100 to see if there were any errors and this is what showed

Auto-negotiation                           0
Disconnect                                                0
False carrier                                                2921
Idle errors                                               13369
Link failures                                               0
Receive errors                          0
Symbol errors                          280
VLAN tags                                        0
TCP Retransmits                           21263
Unrecovered TCP Retransmits      16280

0
 
markzzCommented:
You have some errors which should be looked at. They appear to be dominantly transmitts..
You will need to go over all of the above mentioned items and as bgoering suggested Firmware.. Could I add check the firmware on your switches and of course the ESX Servers.
What version of ESX are you running?
0
 
bgoeringCommented:
What kind of server do you have for esx? Is it on the HCL? How many processors and cores? how much RAM?

The high retransmits indicate that the ReadyNAS is not receiving an ACK for a transmission in a timely manner. The unrecovered number means it didn't receive the ACK at all. If the server is undersized for the workload that can be a contributing factor - as can the speed/duplex mentioned above..
0
 
MECITAuthor Commented:
ESX 4.0

PowerEdge 2950
8 CPUs x2.659 GHz
Intel Xeon E5430 @2.66GHz
32GB RAM

Ready NAS 1100
Version 4.1.7 (latest firmware)
2TB of Storage
0
 
MECITAuthor Commented:
How do I verify that my ESX servers have dedicated interfaces for NFS Traffic?

I opened the ports need for NFS traffic, should I close the ports since you mentioned it does not to be routed or firewalled.
0
 
bgoeringCommented:
Your storage should be on a separate network from other network activity like vmotion and virtual machine traffic. The seperate network should have dedicated physical nics on the ESX server.
0
 
MECITAuthor Commented:
I have the ESX servers, NAS and SAN on 172 network.

NAS eth1 -- 1000 full     172 network
        eth2 -- 1000 full      192 network

Esx Server
vmnic0 -- 1000 full        172 network
(vm network and service console)  My virtual server that connects to NAS is on this nic.

vmnic1 -- 1000 full 192 network
(vmkernel)

vmnic2 -- 1000 full 192 network
(iSCSI service console 2 and iSCSI vmkernal 2)

this is how we have it setup.
0
 
bgoeringCommented:
Maybe I have misunderstood. My understanding was that ESX was mounting the NFS as a datastorem, and on the that datastore .vmdk files would be created for vms.

Is it that ESX is not mounting NFS directly, instead a vm running on ESX is mounting the NFS share?
0
 
MECITAuthor Commented:
The SAN is the main storage for the ESX servers.They are mounting directly to the SAN.
The 1100 NAS was originally used for backup purposes until we purchased a 2100 NAS that took its place.

We wanted to continue to use the NAS so I researched and found that I could use it for virtual machines .

so to answer your question, yes a vm running on ESX is mounting the NFS share.
0
 
bgoeringCommented:
OK, I think we have been working on the wrong problem. What type of nic do you have configured in your virtual machine? Are VMware tools installed?

If no VMware tools please install
If NIC is not VMXNET3 please change (requiress the tools for driver support)
0
 
MECITAuthor Commented:
The nic in the virtual machine is using E1000.
I do have the vmware tools installed.

Can I add the vmnext3 and then remove the other nic?
What exactly is the vmnext3 and why do I need to use it for the NAS?

0
 
bgoeringCommented:
Yes, the only way to get a vmxnet3 nic is to select that type when you add it to the system. vmxnet3 is a highly paravirtualized nic that provides the best performance and throughput with the least amount of overhead for a virtual machine. The E1000 is more of an emulated nic (rather than paravirtualized) that consequently has a bit more overhead and a bit less performance.

I always recommend using the vmxnet3 nic in vmware
0
 
MECITAuthor Commented:
I went ahead a added the vmxnet3 nic and removed the E1000.
Is there anything else I need to do?

By making this change, will this resolve the issue?
0
 
bgoeringCommented:
Hopefully it will resolve the issue - try it and monitor for a bit see if the problem is gone
0
 
MECITAuthor Commented:
Will do.
I'll keep you posted.
Thanks
0
 
MECITAuthor Commented:
I just checked

Restored connection to server 192.168.130.112
mount point /ESX mounted as
b175e60c-45a2f524-0000-000000000000 (1100-
NFS Storage).
info
2/17/2011 9:23:37 AM

Lost connection to server 192.168.130.112 mount
point /ESX mounted as b175e60c-45a2f524-0000-
000000000000 (1100-NFS Storage).
error
2/17/2011 9:14:06 AM
0
 
bgoeringCommented:
Please post a screenshot of your configuration for networking on esx. Also a diagram of the physical network identifying esx, the storage, and any switches/routers/firewalls in the path.
0
 
MECITAuthor Commented:
Network Config
Working on an updated diagram. Will post later.
0
 
MECITAuthor Commented:
Here is the Diagram.

 Diagram
0
 
bgoeringCommented:
OK, you are trying to access Readynas over 192 network, same network you are already accessing iSCSI over apparently with no issues. Looking at format of message I am thinking now that you are mounting readynas nfs share as datastore on esx, and instead of the virtual machine mounting storage directly it just has a vmdk file on that datastore. On the physical side you have the switch (miscconfigured port maybe) for speed/duplex error for where readynas plugs in. And you have the cable. I rule out physical on ESX side based on no problems with iSCSI. You apparently using vSwitch2 and vmnic2 on ESX.

This is a process of elimination - but at this point I am beginning to believe device isn't on the HCL for a reason. Other things to try to troubleshoot.

1 different switch port for readynas - if this is a managed switch can you get into it and look at port errors and such? What kind of switch is it?

2 try different patch cable to hook up readynas

3 create a seperate vmkernel port with different IP address and present NFS there, could be conflicting with iSCSI traffic

If none of them I am pretty much out of ideas, could be bad swith I guess... retransmits very high which is pretty unusual for same subnet traffic
0
 
MECITAuthor Commented:
Will test your list and post results tomorrow
0
 
MECITAuthor Commented:
Havent been able to troubleshoot your list but i checked the network logs on the NAS

this is what it shows:

Network Errors  [Ethernet 2]

Auto-negotiation 0
Disconnect 0
False carrier 0
Idle errors 0
Link failures 0
Receive errors 0
Symbol errors 0
VLAN tags 0
TCP Retransmits 354
Unrecovered TCP Retransmits 265

The last time the NAS lost connection was at 11:27am restored at 11:27:44am.

Are the low numbers better or should they be at zero?
0
 
bgoeringCommented:
low numbers are much better
0
 
MECITAuthor Commented:
Would I need still troubleshoot or is this normal?

I will check through out the weekend to see if it loses connection
0
 
bgoeringCommented:
I would keep an eye on it - an occassional retransmit is ok, but more commonly seen in a WAN environment than a LAN
0
 
MECITAuthor Commented:
It loses and restores connection with in certain time frames

2/18/11 - 2:34am through 11:27am
               4:47pm through 10:25pm
2/19/11 - 3:45am through 6:25am
               2:25pm through 10:49pm
2/20/11 - 5:29am through 7:19am
               11:19am through 11:53pm
2/21/11
I switched out the cable for the ReadyNAS.
I plugged it into a new port on the switched.

The switch is a Dell PowerConnect 5424. As far as I can see, the port is not showing any errors but there are no logs. I have to setup a remote log server.

Will be working on creating seperate vmkernal port for  NFS.
0
 
bgoeringCommented:
I am not real familier with the dell switches (cisco guy here) but there should be interface statistics even without a log server. You may want to clear the error counters and observe them.
0
 
MECITAuthor Commented:
This is what I am getting on the switch:

Received Unicast Packets                2614321
Transmit Unicast Packets                 596771
Received Nonunicast Packets          44
Transmit Nonunicast Packets            757


I created a vmkernal on our other ESX host.
I gave it  a 192.168.150.100 IP address.

Then I went to the Ready NAS and changed it to 192.168.150.112. As soon as I did this it went offline and the vm was not accessible.

I changed the IP on the NAS to its original and restarted the NAS. VM is back up.

My plan was to migate it over to the other host so that it would use that dedicated nic for the NFS vmkernal.

Should I have migrated it over to the other host first before changing the IP on th NAS? When it grays out should I unmount it and mount it again with the new IP address?

Would it be better to create the vmkernal on the current host?
0
 
bgoeringCommented:
Yes, I believe you will need to unmount the nfs and then remount it when changing IP address.

Just looked at the cli reference guide for your switch (http://support.dell.com/support/edocs/network/pc80xx/en/cli_ref/PDF/cli_ref.pdf) and on page 244 it starts a description of the "show interfaces counters" command that should show all of the information per port - pay special attention to the counters for error conditions such as alignment, oversized packet, MAC Rx Errors, and collisions. For example:

show interfaces counters ethernet 1/g1

will show all of the counters for port 1/g1

Looks as if you need to issue the command in priveledged exec mode as

show interfaces counters ethernet 1/xg1

To get all of the information
0
 
MECITAuthor Commented:
I changed the IP address on the NAS and unmounted it.

I received an error:

 Error when mounting
0
 
bgoeringCommented:
Of you changed the IP address on the vmkernel you will likely need to go into the NAS and permit the new IP.
0
 
MECITAuthor Commented:
I am having an issue with the NAS. I had to reboot the NAS and now I am not able to get into the NAS.
I am going to try to reset it to factory default and may be try setting it up again.
Will keep you post.
0
 
MECITAuthor Commented:
I had to reset the NAS back to Factory Default. I have it up and running now.
I had created a backup of the ESX folder that was on the NAS , it contained the vm server files, beforeI wiped it out.
How would I get my server running again in vm?
0
 
bgoeringCommented:
Restore your backup, then browse the datastore and right click the vmx file for each vm and select add to inventory
0
 
MECITAuthor Commented:
uploading error
I can not power on the vm server.
0
 
MECITAuthor Commented:
I transferred everything over except for a 750G vmdk file. When I try to remove it an error pops up

"A general system error occured: the system returned an error.  Communication with the virtual machine may have been interrupted"

Does this mean I have to move it over to power it on?

I wanted to try an recreate another virtual disk since I did not move it over to the NAS.
0
 
bgoeringCommented:
If a disk is missing from the configuration it will not allow you to power on the vm. You can either move it over so it finds it, or if it isn't needed you can go into edit settings and remove the virtual hard disk. Probably you will have to select the option not to delete from disk otherwise it will lood for it to delete it, and it isn't there.
0
 
MECITAuthor Commented:
I get the same error on both options.
I guess I have to recreate it or move it over during the weekend.
0
 
MECITAuthor Commented:
I copied over the 750 GB vmdk file over the weekend.
When I try to power on the vm server I receive the following error:

"Could not power on VM :Permission denied."
0
 
MECITAuthor Commented:
I basically started from scratch. I feel that the ReadyNAS 1100 is not reliable for any production virtual machines.

We are going to order an MD1000 expansion for our MD3000i.

Thank you for your help.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Amazon Web Services - Basic

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

  • 24
  • 16
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now