Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


Accidentally deleted Xen Pool wide Bonded Network, need to know how it should have been fixed.. Xen 1.06, Xen Center 6.2

Posted on 2013-12-17
Medium Priority
Last Modified: 2016-11-23
Made a mistake using Xen Center, freely admit that. I am attempting to pinpoint what damage I caused, what the fix should have been.

XEN Pool Production consisting of 9 Xen Boxes and a Dell iScsi attached SAN space. Vms were mixed on local storage and on SAN space.
XEN-4 was Pool Master.  

Xen-8 was not working correctly, VMs could be assigned to the box as their home server, but as soon as I started the VM it would move to a different Xen box. stopping the vm returned it to Xen-8.

My horrible mistake!!!!
Using XenCenter, I went to Xen-8 and clicked on Networking. Forgetting to place the server into maintenance mode, I clicked on Pool Wide bond 0 and deleted it. Completely removing the Network Bond 0 from the entire pool. (sicking thud).

What still worked at this point.
At this point, all vms that were on the Xen box's local storage continued to work fine. My websites loaded and were browsable. SQL was working, etc. However no VMs that were stored on the SAN responded.

What would the proper method of fixing this issue have been? I would have assumed putting all Xen boxes except the Pool Master in maintenance mode, or some such. Figured around 2 to 3 hours of down time to fix.

Seems like we just needed to re-establish the Pool Wide network Bond.

I went and told the Primary administrator what I'd done. He decided to take the opportunity to not just fix the issue but to change the pool to make XEN-1 the pool master. I believe in doing this he started removing Xen boxes from the Pool. This act is known to format the internal storage of the Xen box. After he rebuilt the pool with XEN-1 as the master, he found that all the local VMs were gone. He then blamed me for this, while I had tested extensively and knew they were there when he started.
We have been down for days and he is blaming the entire down time on my mistake and my job is in jeopardy.

I would love to know what the proper resolution for this issue should have been, step by step so I can take this into the retro meeting. Perhaps I'm wrong and removing servers from the pool was the only recourse, but I don't think so.

I'm the first one to admit my idiot mistake, but I believe this guy aggravated the situation and caused the extensive down time.
Question by:adamant40
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
LVL 23

Expert Comment

by:Ayman Bakr
ID: 39729188
It is really frustrating how some people would abuse the situation and hide their ignorance by throwing their mistakes on others.

The primary admin is really ignorant on the part of XenServers and how to administer them. It is really very strange why he needed to remove all the XenServers from the pool just to make the Xen-1 as the pool master!!!! You can do that as follows:

1. Open the console on the host you want to make as the pool master, which is Xen-1
2. If you have HA, you need to disable it with this command:
xe pool-ha-disable

Open in new window

3. Assuming that your pool master Xen-4 is functioning properly (which I think is in your case), issue the following commands:
xe host-list

Open in new window

Note down the uuid of the XenServer you want to make as a master (Xen-1) from the output above and insert it in the following commands:
xe pool-designate-new-master host-uuid=<uuid of Xen-1>

Open in new window

4. Re-enable HA:
xe pool-ha-enable

Open in new window

As for your initial issue. The most dangerous practice performed by many IT professionals is testing in production!! Your primary admin also complicated things by trying to achieve something on production that he never tried before while a major issue was existing. Instead, he should have focused on resolving the major issue and later think of housekeeping work!!

Yes, you are correct you should have focused on re-establishing the network pool-wide bond. Sites like the following could have helped you even if you needed to do it in a maintenance window where production would be down for a couple of hours or so:

I am not sure how bad is it now in your situation. But maybe this would help to get your VMs back:
1. Find the uuids of the failing VMs
2. Reset the power state on these VMs
3. Restart the VMs
Hope this recovery guide would help:

Author Comment

ID: 39736204
Thank you for your response and your comments. Sorry for the delay, been working crazy hours trying to rebuild entire production infrastructure. Tried looking through the documentation you listed for the exact fix for my mistake but my low level skills did not allow met to find out the specific, step by step instructions that should have been carried out to repair my mistake. Was hoping to have something like that to take to to the review. Is it possible for you to provide that? "In the even that someone is a moron and removes the global bonded network from the global pool, you would fix it by doing the following?"
LVL 23

Accepted Solution

Ayman Bakr earned 1500 total points
ID: 39736790
Maybe the steps at the end of this article could have helped:

Author Closing Comment

ID: 39736870
Thanks very much, wish I'd had this information available to me at the time of my error.

Featured Post

The Ideal Solution for Multi-Display Applications

Check out ATEN’s VS1912 12-Port DP Video Wall Media Player at InfoComm 2017. Kerri describes how easy it is to design creative video walls in asymmetric layouts and schedule detailed playlists ahead of time with its advanced scheduling feature.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Citrix policies are the most efficient method to configure and tune XenDesktop environments, allowing organizations to control connection, security and bandwidth settings based on various combinations of users, devices or connection types.  Citrix …
This article explains the fundamentals of industrial networking which ultimately is the backbone network which is providing communications for process devices like robots and other not so interesting stuff.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question