Jerry Seinfield
asked on
Windows 2008 SP2 & Exchange 2007 CCR failover did not work properly
Hi Experts,
We are seeing a few strange issues in one of our clusters, (failover did not work properly)
Our environment is 2 Physical servers running Windows 2008 SP2(exchange 2007 SP2 CCR), multiple VMs running Windows 2008 SP2(share witness), and multiple CAS/HUB(vms)
today there was an issue that my cluster was unable to contact share witness(HUB), and the active node of CCR became unresponsive
There is also the following error in the system event log which might cause not being able to bring all the cluster resources online.
The Fibre Channel Platform Registration Service could not register the platform with fabric 10:00:00:05:1e:ba:48:00.
In addition to that,we are seeing errors as Event ID 1230/ 1146 task category resource control manager, and source FailoverClustering
Can anyone point me on the right direction?
We are seeing a few strange issues in one of our clusters, (failover did not work properly)
Our environment is 2 Physical servers running Windows 2008 SP2(exchange 2007 SP2 CCR), multiple VMs running Windows 2008 SP2(share witness), and multiple CAS/HUB(vms)
today there was an issue that my cluster was unable to contact share witness(HUB), and the active node of CCR became unresponsive
There is also the following error in the system event log which might cause not being able to bring all the cluster resources online.
The Fibre Channel Platform Registration Service could not register the platform with fabric 10:00:00:05:1e:ba:48:00.
In addition to that,we are seeing errors as Event ID 1230/ 1146 task category resource control manager, and source FailoverClustering
Can anyone point me on the right direction?
Could register error sounds like your FC control lost communication with the Storage/SAN Switch. Try checking FC HBA drivers in the device manager. Try running FC-HBA application and check whether it is binded properly.
ASKER
Any other suggestions?
can you post the results of the Get-ClusteredMailboxServer Status?
Was the FSW available during this time?
Was the FSW available during this time?
ASKER
By the time this issue happened, FSW was not available, and also, the active node was frozen(hung state)
then two nodes were down and you no longer had quorum so the exchange services went offline
you need to determine what happened to your fibre connection
check switch logs
you need to determine what happened to your fibre connection
check switch logs
ASKER
I found the following issue on cluster
The Fibre Channel Platform Registration Service could not register the platform with fabric 10:00:00:05:1e:ba:48:00
However, the SAN admin guys states that Is not a cause for concern, so at this point anything to check at windows 2008 cluster side of things?
The Fibre Channel Platform Registration Service could not register the platform with fabric 10:00:00:05:1e:ba:48:00
However, the SAN admin guys states that Is not a cause for concern, so at this point anything to check at windows 2008 cluster side of things?
run the following to generate a cluster log file for both nodes that will be saved it the subdirectory clusterlogs under the current directory
cluster log /g /copy:clusterlogs /level:5
if you don't specify /copy: a log will be generated on each node under the following directory:
%windir%\Cluster\Reports
you could then analyze these logs to try to determine what happened and when
cluster log /g /copy:clusterlogs /level:5
if you don't specify /copy: a log will be generated on each node under the following directory:
%windir%\Cluster\Reports
you could then analyze these logs to try to determine what happened and when
ASKER
Thanks Endital for the quick answer
Lets say that the SAN is ok, what else can be wrong that my active node in the cluster becomes unresponsive?
What are most common issues in Windows 2008 clustering, and Exchange 2007 CCR?
Cheers
Lets say that the SAN is ok, what else can be wrong that my active node in the cluster becomes unresponsive?
What are most common issues in Windows 2008 clustering, and Exchange 2007 CCR?
Cheers
since there was a fibre alert, i would start by looking at the hba drivers and ensure that they are up-to-date
are you using mpio?
are you using mpio?
ASKER
yes, we are using mpio
Any known issues with Windows 2008 MPIO, and Exchange 2007 SP2 CCR?
Any known issues with Windows 2008 MPIO, and Exchange 2007 SP2 CCR?
none that i am aware of currently. i use mpio with scc.
i asked because with mpio in place you should be able to stand a single hba failure. do you have each hba going into separate switches? any other server have issues at this time?
i asked because with mpio in place you should be able to stand a single hba failure. do you have each hba going into separate switches? any other server have issues at this time?
ASKER
yes we have each HBA going to separate switches, and by the time this issue happen, another cluster failed, we have a total of 6 clusters (all hardware, except HUB/CAS)share witness in on HUBs
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
i would run the following
Get-ClusteredMailboxServer