Cyril_M
asked on
NIC of Linux RHEL6 VM go to sleep after some idle time. Cannot ping from Cisco ASA5505
Hi Experts,
I have an ESXi dedicated server at my housing French provider (OVH).
On that server I have 4 VMs, one of these VMs is a RedHat Linux 6 server (172.16.10.54/12) behind a Cisco ASA 5505 (172.16.10.254/12) configured with NAT.
If this computer remains inactive for a while, I can't ping that server anymore from outside or from the CISCO firewall.
I checked the ARP table of the CISCO when the problem occured; the entry correspondig to my server is here but I can't ping that server.
If I ping my firewall from that VM, it "wakes up the network" then I can ping the server from the Firewall.
Another precision : Everything is configured through a logical VLAN.
The othes VM doesn't have the problem but are always in activity.
Can you help me to understand where is the problem and how to get rid of it ?
Thanks.
I have an ESXi dedicated server at my housing French provider (OVH).
On that server I have 4 VMs, one of these VMs is a RedHat Linux 6 server (172.16.10.54/12) behind a Cisco ASA 5505 (172.16.10.254/12) configured with NAT.
If this computer remains inactive for a while, I can't ping that server anymore from outside or from the CISCO firewall.
I checked the ARP table of the CISCO when the problem occured; the entry correspondig to my server is here but I can't ping that server.
If I ping my firewall from that VM, it "wakes up the network" then I can ping the server from the Firewall.
Another precision : Everything is configured through a logical VLAN.
The othes VM doesn't have the problem but are always in activity.
Can you help me to understand where is the problem and how to get rid of it ?
Thanks.
Do you have iptables on that RHEL box enabled?
ASKER
I also tried with iptables off. Same problem.
A few minutes ago, I ran "tcpdump -n -p icmp" while I had the problem. I could not ping the VM from the firewall and TCPDump didn't catch anything.
My current tests are about the network driver. I changed the NIC type from VMXNET3 to E1000. Now, I wait a while to see if the problem occurs again.
A few minutes ago, I ran "tcpdump -n -p icmp" while I had the problem. I could not ping the VM from the firewall and TCPDump didn't catch anything.
My current tests are about the network driver. I changed the NIC type from VMXNET3 to E1000. Now, I wait a while to see if the problem occurs again.
Anything in var/log/message? what about the log in the asa?
ASKER
I can't see anything that speaks to me !!
Nothing in the ASA logs.
The change of the Network card from VMXNET3 to E1000 didn't solve the problem.
Here is a piece of /var/log/messages (I had the problem around 14:35, I tried "tcpdump -n").
Apr 3 14:10:23 serveur clamd[1375]: HTML support enabled.
Apr 3 14:10:23 serveur clamd[1375]: Self checking every 600 seconds.
Apr 3 14:10:25 serveur saslauthd[1590]: detach_tty : master pid is: 1590
Apr 3 14:10:25 serveur saslauthd[1590]: ipc_init : listening on socket: /var/run/saslauthd/mux
Apr 3 14:10:36 serveur named[1309]: validating @0x7f6654019c90: org SOA: got insecure response; parent indicates it should be secure
Apr 3 14:10:36 serveur named[1309]: error (no valid RRSIG) resolving 'linux.org/DS/IN': 199.249.120.1#53
Apr 3 14:10:37 serveur rhnsd[1721]: Red Hat Network Services Daemon starting up, check in interval 240 minutes.
Apr 3 14:18:14 serveur named[1309]: client 127.0.0.1#34706: RFC 1918 response from Internet for 54.10.16.172.in-addr.arpa
Apr 3 14:23:03 serveur clamd[1375]: No stats for Database check - forcing reload
Apr 3 14:23:03 serveur clamd[1375]: Reading databases from /var/lib/clamav
Apr 3 14:23:07 serveur clamd[1375]: Database correctly reloaded (2078365 signatures)
Apr 3 14:35:37 serveur kernel: Bluetooth: Core ver 2.15
Apr 3 14:35:37 serveur kernel: NET: Registered protocol family 31
Apr 3 14:35:37 serveur kernel: Bluetooth: HCI device and connection manager initialized
Apr 3 14:35:37 serveur kernel: Bluetooth: HCI socket layer initialized
Apr 3 14:35:49 serveur kernel: device eth0 entered promiscuous mode
Apr 3 14:35:53 serveur kernel: device eth0 left promiscuous mode
Apr 3 14:38:04 serveur clamd[1375]: SelfCheck: Database status OK.
Nothing in the ASA logs.
The change of the Network card from VMXNET3 to E1000 didn't solve the problem.
Here is a piece of /var/log/messages (I had the problem around 14:35, I tried "tcpdump -n").
Apr 3 14:10:23 serveur clamd[1375]: HTML support enabled.
Apr 3 14:10:23 serveur clamd[1375]: Self checking every 600 seconds.
Apr 3 14:10:25 serveur saslauthd[1590]: detach_tty : master pid is: 1590
Apr 3 14:10:25 serveur saslauthd[1590]: ipc_init : listening on socket: /var/run/saslauthd/mux
Apr 3 14:10:36 serveur named[1309]: validating @0x7f6654019c90: org SOA: got insecure response; parent indicates it should be secure
Apr 3 14:10:36 serveur named[1309]: error (no valid RRSIG) resolving 'linux.org/DS/IN': 199.249.120.1#53
Apr 3 14:10:37 serveur rhnsd[1721]: Red Hat Network Services Daemon starting up, check in interval 240 minutes.
Apr 3 14:18:14 serveur named[1309]: client 127.0.0.1#34706: RFC 1918 response from Internet for 54.10.16.172.in-addr.arpa
Apr 3 14:23:03 serveur clamd[1375]: No stats for Database check - forcing reload
Apr 3 14:23:03 serveur clamd[1375]: Reading databases from /var/lib/clamav
Apr 3 14:23:07 serveur clamd[1375]: Database correctly reloaded (2078365 signatures)
Apr 3 14:35:37 serveur kernel: Bluetooth: Core ver 2.15
Apr 3 14:35:37 serveur kernel: NET: Registered protocol family 31
Apr 3 14:35:37 serveur kernel: Bluetooth: HCI device and connection manager initialized
Apr 3 14:35:37 serveur kernel: Bluetooth: HCI socket layer initialized
Apr 3 14:35:49 serveur kernel: device eth0 entered promiscuous mode
Apr 3 14:35:53 serveur kernel: device eth0 left promiscuous mode
Apr 3 14:38:04 serveur clamd[1375]: SelfCheck: Database status OK.
To better under this, I assume -
Only the RHEL box, out of 4 VM is having the issue.
There is no special ACLs in the asa.
Is RHEL box a new install?
Only the RHEL box, out of 4 VM is having the issue.
There is no special ACLs in the asa.
Is RHEL box a new install?
ASKER
Yes, this RHEL box is a new install. Version 6.4. It's difficult to tell you if others VMs have the problem because the other ones are very active.
The others boxes are older versions.
I read several posts dealing with power management issues.
I think it can be a clue.
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Power_Management_Guide/#ASPM
I try to disable ASPM to see if it changes anything.
EDIT : Same problem with ASPM disabled ... Don't know where to look now !
The others boxes are older versions.
I read several posts dealing with power management issues.
I think it can be a clue.
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Power_Management_Guide/#ASPM
I try to disable ASPM to see if it changes anything.
EDIT : Same problem with ASPM disabled ... Don't know where to look now !
Can you post the sh run from the asa?
what version of Vmware are you using?
What type of physical network cards are installed in you ESX/ESXi server?
Are you using Standard or Distributed vSwitches?
What type of physical network cards are installed in you ESX/ESXi server?
Are you using Standard or Distributed vSwitches?
ASKER
My version is ESXI 4.1.0 Build 348481
NIC is Intel Corporation 82574L Gigabit
I'm using Standard VSwitches (I run standalone Esxi hosts with no vCenter Server).
Tks
NIC is Intel Corporation 82574L Gigabit
I'm using Standard VSwitches (I run standalone Esxi hosts with no vCenter Server).
Tks
Have you tried to upgrade the VMware drivers for the Intel NIC? I know you are running ESXi4.1 but I do not see drivers listed for 4.1 any more. You could try to install the 5.x drivers first but I would suggest trying this on a test ESxi server first.
https://my.vmware.com/web/vmware/info/slug/datacenter_cloud_infrastructure/vmware_vsphere_with_operations_management/5_1#drivers_tools
https://my.vmware.com/web/vmware/info/slug/datacenter_cloud_infrastructure/vmware_vsphere_with_operations_management/5_1#drivers_tools
ASKER
I didn't try such a thing.
It's not so easy because the server is not mine (I rent a dedicated server at OVH) and I didn't install VMware ESXi, I only configured VMs on it.
I have no physical access to that machine and I run production VMs on it.
I haven't any other machine with the same configuration (same NIC).
For the moment I added a ping toward my firewall that run every two minutes as a cron task.
With that "ugly solution" the server remains reachable.
If I run that task only each five minutes, I loose the network connection and I need to "wake up" the VM through VMware Vsphere Client or to wait for the next ping !!
It's not so easy because the server is not mine (I rent a dedicated server at OVH) and I didn't install VMware ESXi, I only configured VMs on it.
I have no physical access to that machine and I run production VMs on it.
I haven't any other machine with the same configuration (same NIC).
For the moment I added a ping toward my firewall that run every two minutes as a cron task.
With that "ugly solution" the server remains reachable.
If I run that task only each five minutes, I loose the network connection and I need to "wake up" the VM through VMware Vsphere Client or to wait for the next ping !!
According to your ESXi 4.1 build number you provided, you are only running ESXi 4.1 Update 1. The latest update and patch level released for ESXi 4.1 is Update 3 Patch 7 Build (988178). I would suggest you talk to your OVH about running some critical updates on your hosted ESXi server.
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
You're probably both right.
I'm going to open a ticket with OVH.
I'm going to open a ticket with OVH.
I have seen this type of issue, normally caused by port security on the switch side. You can try shut one of the old vm down and see if new one stays up without persistent ping.