coolvds
asked on
Centos 6 (fedora 14) kvm sporadic failures
The described behavior is found on both fedora 14 installs and on centos 6 installs.
I've migrated to centos 6 in a hope that this issue will be gone but it didn't.
The situation:
On one server ~90 VMs are running. From time to time VM's go down without any understandable reason.
Part of log at that time:
Oct 10 06:27:52 vd002 kernel: Pid 7999(qemu-kvm) over core_pipe_limit
Oct 10 06:27:52 vd002 kernel: Skipping core dump
.
Oct 10 06:27:52 vd002 collectd[5879]: Not sleeping because the next interval is 103.306590 seconds in the past!
Oct 10 06:27:52 vd002 collectd[5879]: uc_update: Value too old: name = vd002.local/load/load; value time = 1318228072; last cache update = 1318228072;
Oct 10 06:27:52 vd002 collectd[5879]: uc_update: Value too old: name = vd002.local/memory/memory- used; value time = 1318228072; last cache update = 1318228072;
.
Oct 10 06:27:54 vd002 kernel: br0: port 70(VM165) entering disabled state
Oct 10 06:27:54 vd002 kernel: device VM165 left promiscuous mode
Oct 10 06:27:54 vd002 kernel: br0: port 70(VM165) entering disabled state
Oct 10 06:27:56 vd002 ntpd[5769]: Deleting interface #76 VM, fe80::fc54:ff:feda:5763#12 3, interface stats: received=0, sent=0, dropped=0, active_time=36236 secs
Oct 10 06:28:04 vd002 kernel: Pid 23541(qemu-kvm) over core_pipe_limit
Oct 10 06:28:04 vd002 kernel: Skipping core dump
and so on
20 of 90 gone down.
collectd is gathering statistics from the node server and from the VMs via collectd-libvirt.
VMs can be started normaly after the failure.
Swappiness is 0
# sysctl -a|grep swappi
vm.swappiness = 0
There is enough free ram on the node server. Swap is present, but unused by the system.
Any ideas / help / comments are very appreciated.
I've migrated to centos 6 in a hope that this issue will be gone but it didn't.
The situation:
On one server ~90 VMs are running. From time to time VM's go down without any understandable reason.
Part of log at that time:
Oct 10 06:27:52 vd002 kernel: Pid 7999(qemu-kvm) over core_pipe_limit
Oct 10 06:27:52 vd002 kernel: Skipping core dump
.
Oct 10 06:27:52 vd002 collectd[5879]: Not sleeping because the next interval is 103.306590 seconds in the past!
Oct 10 06:27:52 vd002 collectd[5879]: uc_update: Value too old: name = vd002.local/load/load; value time = 1318228072; last cache update = 1318228072;
Oct 10 06:27:52 vd002 collectd[5879]: uc_update: Value too old: name = vd002.local/memory/memory-
.
Oct 10 06:27:54 vd002 kernel: br0: port 70(VM165) entering disabled state
Oct 10 06:27:54 vd002 kernel: device VM165 left promiscuous mode
Oct 10 06:27:54 vd002 kernel: br0: port 70(VM165) entering disabled state
Oct 10 06:27:56 vd002 ntpd[5769]: Deleting interface #76 VM, fe80::fc54:ff:feda:5763#12
Oct 10 06:28:04 vd002 kernel: Pid 23541(qemu-kvm) over core_pipe_limit
Oct 10 06:28:04 vd002 kernel: Skipping core dump
and so on
20 of 90 gone down.
collectd is gathering statistics from the node server and from the VMs via collectd-libvirt.
VMs can be started normaly after the failure.
Swappiness is 0
# sysctl -a|grep swappi
vm.swappiness = 0
There is enough free ram on the node server. Swap is present, but unused by the system.
Any ideas / help / comments are very appreciated.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you
ASKER
The version of the qemu-kvm (centos's up-to-date package is qemu-kvm-0.12.1.2-2.113.el
About dumps - I just can get an idea, how to trace the reason of segfaults.
But anyway, I'll proceed with this issue after updating to the latest available qemu-ckv