Link to home
Start Free TrialLog in
Avatar of Albert Widjaja
Albert WidjajaFlag for Australia

asked on

Exchange server CCR 2007 SP3 losing access to FSW causing the random failover

Hi,

Can anyone here please assist me with the mysterious case of Exchange Server mailbox random failover due to the mailbox server unable to see the File Share Witness (FSW) in one of the HT server within the same VLAN?

Server role in the ideal or normal situation:

Mailbox Server (CCR – Stretched Cluster) Nodes
PRODEXMBX01-VM (Active Mailbox, Quorum) – 10.1.1.25
DREXMBX01-VM (Passive mailbox) – 192.168.1.78

Hub Transport and Client Access Server Nodes
PRODEXHTCAS02-VM – 10.1.1.26
PRODEXHTCAS03-VM (FSW holder) – 10.1.1.27
DREXHTCAS02-VM – 192.168.1.79

Last week on Saturday early morning, for some unknown (Event ID Critical 1564) and strange reason, the Active Mailbox Server (PRODEXMBX01-VM) cannot access or see the FSW on the HT server PRODEXHTCAS03-VM, thus the mailbox gets failover to the DR Mailbox server (DREXMBX01-VM).

So I had to perform manual failover back from DR to production so that both Active mailbox and the Quorum are held by the Production Mailbox server (PRODEXMBX01-VM).

Sunday Morning, the Event ID Critical 1564 occurred again thus causing only the quorum only to failover to the DR mailbox server (DREXMBX01-VM) but the Active mailbox role is still held by the Production Exchange server (PRODEXMBX01-VM).  So now the situation is like the following:

Mailbox Server (CCR – Stretched Cluster) Nodes
PRODEXMBX01-VM (Active Mailbox) – 10.1.1.25
DREXMBX01-VM (Passive mailbox, Quorum) – 192.168.1.78

Hub Transport and Client Access Server Nodes
PRODEXHTCAS02-VM – 10.1.1.26
PRODEXHTCAS03-VM (FSW holder) – 10.1.1.27
DREXHTCAS02-VM – 192.168.1.79

So what causing the mailbox servers unable to contact the File Share Witness?


Additional details:
All VM is running on VMware vSphere 5.1u1 and each ESXi servers are running on HP Blades.

Ping-ing and Tracert-ing the FSW server gives immediate reply with no switch or firewall device in between.
Avatar of Jamie McKillop
Jamie McKillop
Flag of Canada image

Hello,

Any backups running when this happens?

-JJ
Avatar of Albert Widjaja

ASKER

Hi James,

No, there was no backup during the time the Critical and error event is logged.
Both the Network team and VMware team shows me that there was no issue on the event log during the time window above as well.

Does the FSW directory permission needs to have any other security group or do I just put the Cluster AD account name only is enough ?
ASKER CERTIFIED SOLUTION
Avatar of compdigit44
compdigit44

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Comp,

The Exchange Servers  are all Virtual Machine running on VMware ESXi. The Hardware for ESXi is HP Blade server with Trunk connection to the Cisco Nexus 7000 switch.

The only teaming that I can see is the uplink I the vswitch on the ESXi level.

On the mailbox server because it is CCR there is two network, VM public network where data flows to production VLAN and VM heartbeat network where it only talks between Active and Passive node.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
User generated image
here it is for the CCR Cluster Heart Beat network vNIC attach.
it's all the same policy for the other VLANs and Network Label.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
It there any switch/router or in between these CCR members or this is in the same host?
No there is no network device, because the physical host is in two different Blade servers in the same c7000 Blade Enclosure chassis.

Ensure your production network NIC on top in network binding.

Is this the binding at the OS layer for the mailbox server ?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Yes it is already set at the top of the priorityUser generated image