Link to home
Start Free TrialLog in
Avatar of king daddy
king daddyFlag for United States of America

asked on

exchange 2010 failover clustering error 1207 and 1135 troubleshooting

Greetings,

I have a two-server DAG. one of the servers is logging two event id 1207 entries every 15 minutes. They state:

Cluster with name "ClusterName" could not be brought online. The computer object associated with the resource could not be updated in domain "domain.local" for the following reason: unable to update password for computer account.

The text for the associated error code is:  RPC server unavailable

The cluster identity may lack permissions required to update the object.

and

"Cluster network name resource '%1' cannot be brought online. The computer object associated with the resource could not be updated in domain '%2' for the following reason: %3.

The text for the associated error code is: %4

The cluster identity '%5' may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain."


the other server in the DAG had one entry for error ID 1135 stating:

"Cluster node "EXCH10-2" was removed from the active failover cluster membership. Cluster service may have stopped...."

When I go to the DAG computer object in AD, I only see the exchange server with event ID 1135 listed in the permissions.

Also, if I do a nslookup for 192.168.35.20 (DAG IP), it fails. If I do a nslookup for DAG, it returns the previous IP. Not sure if that means anything to troubleshooting the 1207 error.

I am not sure if I should add the 1207 exchange server to the AD DAG computer object permissions or what. Not really sure where to start. I've looked at several resources online but am confused as to which AD objects I should look at and potentially edit, if any. This started 2 days ago.

Thanks a lot!
Avatar of king daddy
king daddy
Flag of United States of America image

ASKER

I restarted the cluster service on the exch10-2 node. It only has one public database. It was dismounted for a few minutes, then went through recovery steps to re-mount it, which it did successfully. I am now seeing event 1207 on the other DAG node. Initially, I only saw this error on the exch10-2 node.

Further, I am now seeing these event IDs

1564 (FSW failed to arbitrate for the file share \\FSW.domain.local\DAG.domain.local. please ensure it exists)

1069 (cluster resource FSW \\FSW.domain.local\DAG.domain.local in clustered service or application 'cluster group' failed)

1573 (node exch10-2 failed to form a cluster. this is because the witness was not accessible. please ensure the witness is online and available) The DAG share appears when I access the FSW server through a UNC path. Although, I am unable to open it (likely because I am logged on as administrator).

To note, I have not received event ID 1207 in the last 30 minutes. It has skipped the last two entries. Errors 1564 and 1069 appeared 3 minutes before the expected 1207.

I think DAG is broken and not even trying.

Thanks for any help
ASKER CERTIFIED SOLUTION
Avatar of James H
James H
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
event ID 1207 is showing up every 15 minutes on the first node now. It had not done this until I restarted the cluster service on the second node.
Or just open EMC go to Organization Configuration - Mailbox - Database Availability Groups.
You should see DAG Name, member server and Witness directory
Thanks for replying Spartan_1337.

Everything is correct in the output of that command. The witness directory is correct.
EMC shows both nodes and the correct FSW and DAG share as well. Still no 1207 on the second node. They are showing up on the first node every 15 minutes still.

Thanks
found this earlier

http://technet.microsoft.com/en-us/library/dd353973(v=ws.10).aspx

everything checked out. Both nodes show up.

I wonder if moving the FSW share will do anything to correct this. I am looking for a list of the correct permissions in AD, and which objects to apply them to.
so I rebooted the FSW server and I am now receiving error 1207 on the second node again but no longer on the first node.

DAG in EMC shows network up. All databases show mounted / healthy.

Not sure where to go from here.
SOLUTION
Avatar of Philip Elder
Philip Elder
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I had not run across this. Thanks Philip. I will look over it soon.
I checked the DAG DNS entry and the check box for 'delete this record when it becomes stale' was selected and the record time stamp is dated 2/3/2014 9:00 PM. Scavenging is not enabled on the forward lookup zone though, so I am not sure if it matters. Also, there was no reverse lookup zone entry for the DAG. I couldn't ping by IP earlier.

I also noticed that the DAG object properties in AD>Computers>DAG 'right-click'>Properties>Object tab show the object was created 7/11/2013 and modified 2/3/2014 at 8:41 PM.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks Sre Raj. I noticed that Exch10-2 was not listed at all on the DAG computer object in AD, while the other node. I'm going to add Exch10-2 and duplicate the permissions.

I had seen that link but it only mentioned checking the quota and that the DAG computer object should have full permissions. There was another article that mentioned, as you did, that the nodes should have certain permissions as well. That said, how do I check the quota? I only see an option to change the quota in ADSIedit, not view the current usage.

Thanks a lot!
SreRaj, I added exch10-2 and matched the permissions of the first node. I am still getting the same 1207 errors as before. Thx
While editing the value, ms-DS-MachineAccountQuota you could see the current value set for this attribute and by default it will be 10.

Please try modifying this value and also try restarting the node exch10-2.
I will check that SreRaj. I will reboot the node when possible. If the reboot does not clear the error, I am going to call Microsoft and have support check it out. Will update.

thx
I have not been able to reboot the server yet. However, all of a sudden, on the 20th the errors stopped on the second node and have not reappeared on either node. DAG info in EMC looks good. Again, I did nothing (no reboot, no disabling / re-enabling NIC, etc.).
none of these fixed the issue directly but I awarded points for the replies and troubleshooting tips.