• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 139
  • Last Modified:

Exchange server CCR 2007 SP3 losing access to FSW causing the random failover

Hi,

Can anyone here please assist me with the mysterious case of Exchange Server mailbox random failover due to the mailbox server unable to see the File Share Witness (FSW) in one of the HT server within the same VLAN?

Server role in the ideal or normal situation:

Mailbox Server (CCR – Stretched Cluster) Nodes
PRODEXMBX01-VM (Active Mailbox, Quorum) – 10.1.1.25
DREXMBX01-VM (Passive mailbox) – 192.168.1.78

Hub Transport and Client Access Server Nodes
PRODEXHTCAS02-VM – 10.1.1.26
PRODEXHTCAS03-VM (FSW holder) – 10.1.1.27
DREXHTCAS02-VM – 192.168.1.79

Last week on Saturday early morning, for some unknown (Event ID Critical 1564) and strange reason, the Active Mailbox Server (PRODEXMBX01-VM) cannot access or see the FSW on the HT server PRODEXHTCAS03-VM, thus the mailbox gets failover to the DR Mailbox server (DREXMBX01-VM).

So I had to perform manual failover back from DR to production so that both Active mailbox and the Quorum are held by the Production Mailbox server (PRODEXMBX01-VM).

Sunday Morning, the Event ID Critical 1564 occurred again thus causing only the quorum only to failover to the DR mailbox server (DREXMBX01-VM) but the Active mailbox role is still held by the Production Exchange server (PRODEXMBX01-VM).  So now the situation is like the following:

Mailbox Server (CCR – Stretched Cluster) Nodes
PRODEXMBX01-VM (Active Mailbox) – 10.1.1.25
DREXMBX01-VM (Passive mailbox, Quorum) – 192.168.1.78

Hub Transport and Client Access Server Nodes
PRODEXHTCAS02-VM – 10.1.1.26
PRODEXHTCAS03-VM (FSW holder) – 10.1.1.27
DREXHTCAS02-VM – 192.168.1.79

So what causing the mailbox servers unable to contact the File Share Witness?


Additional details:
All VM is running on VMware vSphere 5.1u1 and each ESXi servers are running on HP Blades.

Ping-ing and Tracert-ing the FSW server gives immediate reply with no switch or firewall device in between.
0
Senior IT System Engineer
Asked:
Senior IT System Engineer
  • 5
  • 2
  • 2
  • +1
4 Solutions
 
Jamie McKillopCommented:
Hello,

Any backups running when this happens?

-JJ
0
 
Senior IT System EngineerIT ProfessionalAuthor Commented:
Hi James,

No, there was no backup during the time the Critical and error event is logged.
Both the Network team and VMware team shows me that there was no issue on the event log during the time window above as well.

Does the FSW directory permission needs to have any other security group or do I just put the Cluster AD account name only is enough ?
0
 
compdigit44Commented:
Has anything changed in your environment? Are you using LACP? What is your NIC teaming policy and NIC type?
0
Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

 
Senior IT System EngineerIT ProfessionalAuthor Commented:
Hi Comp,

The Exchange Servers  are all Virtual Machine running on VMware ESXi. The Hardware for ESXi is HP Blade server with Trunk connection to the Cisco Nexus 7000 switch.

The only teaming that I can see is the uplink I the vswitch on the ESXi level.

On the mailbox server because it is CCR there is two network, VM public network where data flows to production VLAN and VM heartbeat network where it only talks between Active and Passive node.
0
 
compdigit44Commented:
in vmware what is your nic teaming policy on your vswitch?
0
 
Senior IT System EngineerIT ProfessionalAuthor Commented:
vSwitch Policy
here it is for the CCR Cluster Heart Beat network vNIC attach.
it's all the same policy for the other VLANs and Network Label.
0
 
MASTechnical Department HeadCommented:
It there any switch/router or in between these CCR members or this is in the same host?

Ensure Vmware Virtual switch configured properly
Ensure your production network NIC on top in network binding.
0
 
Senior IT System EngineerIT ProfessionalAuthor Commented:
It there any switch/router or in between these CCR members or this is in the same host?
No there is no network device, because the physical host is in two different Blade servers in the same c7000 Blade Enclosure chassis.

Ensure your production network NIC on top in network binding.

Is this the binding at the OS layer for the mailbox server ?
0
 
MASTechnical Department HeadCommented:
--?Is this the binding at the OS layer for the mailbox server ?
You have to set network binding in OS of mailbox server.
http://support.microsoft.com/kb/894564
0
 
Senior IT System EngineerIT ProfessionalAuthor Commented:
Yes it is already set at the top of the priorityNetwork Binding Order
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 5
  • 2
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now