?
Solved

VMware FC failover to second target

Posted on 2011-04-28
15
Medium Priority
?
1,194 Views
Last Modified: 2012-07-03
Hello all,

I have a question about VMware ESXi 4.x and failover of fibrechannel from one target/storage processor to another.

We have an openfiler active/passive cluster and this setup works by one server taking over on shutdown/failure/etc. from the other.
We are using shared direct attached storage arrays that connect to both servers.  When a server takes over it mounts all the appropriate filesystems and takes over a virtual IP.  Consequently, all our iSCSI sessions failover to it.

With iSCSI this is very simple since we point to the virtual IP.
About a year back we began moving to fibrechannel because of speed and cost.  (2G FC is much cheaper than 10GE and bonding of 1GE in VMware for iSCSI has been lackluster.)

While both target servers are setup and working, we are unsure how to have VMware failover to the other target server since both have different WWNs and only one server is presenting LUNs on startup.

I tried looking down the path of virtual WWNs (like with the virtual IP), but no luck there.

Thoughts?

Thanks!!
-Cheers, Peter.
0
Comment
Question by:ein_mann_betrieb
  • 7
  • 7
15 Comments
 
LVL 42

Accepted Solution

by:
Paul Solovyovsky earned 2000 total points
ID: 35488943
You need to configure Round robin on the datastores.  Typically you set round robin on the datastores, set the IOPS to vendor preference (default is 10K), and configure queue depth on the HBAs accordingly (I believe qlogic is 65 and Emulex is 16).

here's an example
0
 
LVL 42

Expert Comment

by:Paul Solovyovsky
ID: 35488945
oops..forgot the link

http://www.ivobeerens.nl/?p=465
0
 
LVL 5

Author Comment

by:ein_mann_betrieb
ID: 35492171
Hi paulsolov,
   How does esxi know about the other wwn?  Does it rescan on failure and pickup based on the signature of the vmfs?
If so, do you know what kind of failover time would be typical for it to figure out the other node and come online?

Thanks!  -Cheers, Peter.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 42

Expert Comment

by:Paul Solovyovsky
ID: 35494286
Check datastore and see if it sees it as a path. By default uses most recent path but will failiver
0
 
LVL 5

Author Comment

by:ein_mann_betrieb
ID: 35695399
Hi.

Sorry... I haven't abandoned, but I haven't got a test rig setup yet.
I'm waiting for a spare server to use for the failover test.  Will report back as soon as I can.

Thanks.  -Cheers, Peter.
0
 
LVL 5

Author Comment

by:ein_mann_betrieb
ID: 35760383
Hi paulsolov,
  Sorry for the delay.   So I setup a second target server with a direct attached shared storage.  When I failover the storage system to the other server, VMware shows both paths as "Dead" but I don't see it recognize the path via the new storage server until I reboot the ESXi host.

I must be missing something...

Thanks!  -Cheers, Peter.
0
 
LVL 42

Expert Comment

by:Paul Solovyovsky
ID: 35760515
What do you mean failover to other system?  Are you talking about VMWare HA?  Are both systems in a VMware cluster or MCSC cluster?  
0
 
LVL 5

Author Comment

by:ein_mann_betrieb
ID: 35765569
Hi paulsolov,
  Not HA.  I have in this case one ESXi server and two storage processors in an Active/Standby configuration.

When I startup ESXi it sees only the Active storage processor.  Make sense since the other one is in standby mode and is not presenting luns.

If I force a failure on the Active storage processor, the Standby system storage processor takes over, but I can't seem to get ESXi to find the datastore again until I reboot ESXi.

Thoughts?

Thanks!  -Cheers, Peter.
0
 
LVL 42

Expert Comment

by:Paul Solovyovsky
ID: 35765648
What make model San?
0
 
LVL 5

Author Comment

by:ein_mann_betrieb
ID: 35765717
Hi paulsolov,
   Its based on OpenFiler and the linux scst-fc software.
Our HBAs and switches are all QLogic.  Specifically QLA2342 HBAs and SanBox2 switches.
0
 
LVL 42

Assisted Solution

by:Paul Solovyovsky
Paul Solovyovsky earned 2000 total points
ID: 35766210
Most likely a storage SP issue.  I have done Netapp, HP, etc.. and as long as the path is up on initial controller the second controller picks up where the first one left and usually a battery cache that preserves the data.  This is why I avoid openfiler in a production environment as the software is good but you don't always have the tight integreation of software/hardware and the support that comes along with it.

Normally the SP should take over the function of the failed controller, in this case it's either not failing the dead path fast enough and picking the new path.

Since Openfiler is not on the VMWare HCL these type of issues usually occur since they haven't been fully tested and certified.
0
 
LVL 5

Author Comment

by:ein_mann_betrieb
ID: 35891403
Hi paulsolov,
  So I think I have my answer...  Spent several hours this past weekend on trying to rule out all the components.  It seems that the HBA in my one SP had formed some type of failure... Not sure what... but when I got a PCI SERR ERR code thrown, I knew something for sure went bad.

  Replace the HBA and now Round Robin works like a champ...  copied several gigs of data and the MD5 sums were all matching... yay.

  Seems the trick is to present the same LUNs on both SPs at the same time.  So its not something you keep offline and bring online in case of failure.  Its needs to be more of an active/active config in order to work.

  Still not sure why rescanning the HBAs in ESXi doesn't show any new LUNs until a reboot.  But maybe this is a limitation of the free license?

Thanks ever so much!
  -Cheers, Peter.
0
 
LVL 42

Expert Comment

by:Paul Solovyovsky
ID: 35891581
The free and paid license have the same functionality when it comes to storage.  Make sure you're running on the latest ESXi 4.1 update 1
0
 

Expert Comment

by:murdochka
ID: 38148504
Hi ein_mann_betrieb

Is it at all possible for you to elaborate on how you got your setup to work.

I am abit confused. Is drbd handling the replication or is vmware doing that for you?

If you could explain in abit more detail that would be a great help.

Regards
0
 
LVL 5

Author Comment

by:ein_mann_betrieb
ID: 38148824
Hi murdochka,
   We are not using drbd for this setup, but that is certainly a possible way of implementing this type of setup.  We are using special clustering RAID hardware whereby we have two servers both attached to the same external disk shelves.  So the RAID cards on both servers are working in lock-step with one another.  Either controller can takeover completely if the other fails.

  When the heartbeat daemon detects a failure, it run scripts on the failover server.  The failover server then mounts all the LVMs we have on the disks, takes over the virtual iscsi IP addresses, and brings up the daemons for nfs, iscsi target, fc target, etc.

Hope that helps.  -Cheers, Peter.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we will learn how to backup a VMware farm using Nakivo Backup & Replication. In this tutorial we will install the software on a Windows 2012 R2 Server.
Article by: evilrix
Looking for a way to avoid searching through large data sets for data that doesn't exist? A Bloom Filter might be what you need. This data structure is a probabilistic filter that allows you to avoid unnecessary searches when you know the data defin…
Teach the user how to configure vSphere clusters to support the VMware FT feature Open vSphere Web Client: Verify vSphere HA is enabled: Verify netowrking for vMotion and FT Logging is in place or create it: Turn On FT for a virtual machine: Verify …
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…
Suggested Courses

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question