Accidentally deleted Xen Pool wide Bonded Network, need to know how it should have been fixed.. Xen 1.06, Xen Center 6.2

Posted on 2013-12-17
Medium Priority
Last Modified: 2016-11-23
Made a mistake using Xen Center, freely admit that. I am attempting to pinpoint what damage I caused, what the fix should have been.

XEN Pool Production consisting of 9 Xen Boxes and a Dell iScsi attached SAN space. Vms were mixed on local storage and on SAN space.
XEN-4 was Pool Master.  

Xen-8 was not working correctly, VMs could be assigned to the box as their home server, but as soon as I started the VM it would move to a different Xen box. stopping the vm returned it to Xen-8.

My horrible mistake!!!!
Using XenCenter, I went to Xen-8 and clicked on Networking. Forgetting to place the server into maintenance mode, I clicked on Pool Wide bond 0 and deleted it. Completely removing the Network Bond 0 from the entire pool. (sicking thud).

What still worked at this point.
At this point, all vms that were on the Xen box's local storage continued to work fine. My websites loaded and were browsable. SQL was working, etc. However no VMs that were stored on the SAN responded.

What would the proper method of fixing this issue have been? I would have assumed putting all Xen boxes except the Pool Master in maintenance mode, or some such. Figured around 2 to 3 hours of down time to fix.

Seems like we just needed to re-establish the Pool Wide network Bond.

I went and told the Primary administrator what I'd done. He decided to take the opportunity to not just fix the issue but to change the pool to make XEN-1 the pool master. I believe in doing this he started removing Xen boxes from the Pool. This act is known to format the internal storage of the Xen box. After he rebuilt the pool with XEN-1 as the master, he found that all the local VMs were gone. He then blamed me for this, while I had tested extensively and knew they were there when he started.
We have been down for days and he is blaming the entire down time on my mistake and my job is in jeopardy.

I would love to know what the proper resolution for this issue should have been, step by step so I can take this into the retro meeting. Perhaps I'm wrong and removing servers from the pool was the only recourse, but I don't think so.

I'm the first one to admit my idiot mistake, but I believe this guy aggravated the situation and caused the extensive down time.
Question by:adamant40
  • 2
  • 2
LVL 23

Expert Comment

by:Ayman Bakr
ID: 39729188
It is really frustrating how some people would abuse the situation and hide their ignorance by throwing their mistakes on others.

The primary admin is really ignorant on the part of XenServers and how to administer them. It is really very strange why he needed to remove all the XenServers from the pool just to make the Xen-1 as the pool master!!!! You can do that as follows:

1. Open the console on the host you want to make as the pool master, which is Xen-1
2. If you have HA, you need to disable it with this command:
xe pool-ha-disable

Open in new window

3. Assuming that your pool master Xen-4 is functioning properly (which I think is in your case), issue the following commands:
xe host-list

Open in new window

Note down the uuid of the XenServer you want to make as a master (Xen-1) from the output above and insert it in the following commands:
xe pool-designate-new-master host-uuid=<uuid of Xen-1>

Open in new window

4. Re-enable HA:
xe pool-ha-enable

Open in new window

As for your initial issue. The most dangerous practice performed by many IT professionals is testing in production!! Your primary admin also complicated things by trying to achieve something on production that he never tried before while a major issue was existing. Instead, he should have focused on resolving the major issue and later think of housekeeping work!!

Yes, you are correct you should have focused on re-establishing the network pool-wide bond. Sites like the following could have helped you even if you needed to do it in a maintenance window where production would be down for a couple of hours or so:

I am not sure how bad is it now in your situation. But maybe this would help to get your VMs back:
1. Find the uuids of the failing VMs
2. Reset the power state on these VMs
3. Restart the VMs
Hope this recovery guide would help:


Author Comment

ID: 39736204
Thank you for your response and your comments. Sorry for the delay, been working crazy hours trying to rebuild entire production infrastructure. Tried looking through the documentation you listed for the exact fix for my mistake but my low level skills did not allow met to find out the specific, step by step instructions that should have been carried out to repair my mistake. Was hoping to have something like that to take to to the review. Is it possible for you to provide that? "In the even that someone is a moron and removes the global bonded network from the global pool, you would fix it by doing the following?"
LVL 23

Accepted Solution

Ayman Bakr earned 1500 total points
ID: 39736790
Maybe the steps at the end of this article could have helped:


Author Closing Comment

ID: 39736870
Thanks very much, wish I'd had this information available to me at the time of my error.

Featured Post

Receive 1:1 tech help

Solve your biggest tech problems alongside global tech experts with 1:1 help.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Unable to change the program that handles the scan event from a network attached Canon/Brother printer/scanner. This means you'll always have to choose which program handles this action, e.g. ControlCenter4 (in the case of a Brother).
In this article I will be showing you how to subnet the easiest way possible for IPv4 (Internet Protocol version 4). This article does not cover IPv6. Keep in mind that subnetting requires lots of practice and time.
If you're a developer or IT admin, you’re probably tasked with managing multiple websites, servers, applications, and levels of security on a daily basis. While this can be extremely time consuming, it can also be frustrating when systems aren't wor…
Monitoring a network: how to monitor network services and why? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the philosophy behind service monitoring and why a handshake validation is critical in network monitoring. Software utilized …

601 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question