Link to home
Start Free TrialLog in
Avatar of hqpsystems
hqpsystems

asked on

2003 Cluster fails after 4 successful days running under a new Cluster Service account

OurCorporate policy is to change 'system' passwords when someone leaves. I was asked to modify the Domain Administrator account for my 2003 domain which was also being used as the Cluster Service account for my two Windows 2003 cluster nodes. I decided to create a different account clusadmin to sue for the Cluster, and gave it the correct rights as per the MS Knowlegebase article 269229.
I stopped the Cluster Service on both nodes, changed the service account and brought up the cluster without issue. Later, I then changed the domain administrator password. Everything went well for four days (about 100 hours) and then all the cluster resources started failing.
The error for each clustered resource was '9016 DNS signature failed to verify.'
The only way I managed to get round the problem was to change the Cluster Service account back to the domain administrator account (using the new password), restart the first node, and all came up fine. The second node was then brought up successfully.
 What could cause this behaviour, after 4 successful days? I gave the clusadmin account all the rights I believe it should have had. Does some backend process run after 100 hours or something that could cause this? Thanks.
ASKER CERTIFIED SOLUTION
Avatar of 65td
65td
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of hqpsystems
hqpsystems

ASKER

I am currently reluctant to try the new cluster account again, especially as we are a 24 hour site and getting any downtime (intended downtime, that is!) is very tricky and political. In DNS, enabled the Advanced View so I could view the TTL for the Cluster and its resources, they are all set to 20 minutes. Is this the default, and should it be changed? Doesn't seem to explain why the resources stayed up on the new account for around 100 hours.
The c:\windows\cluster\cluster.log file will probably say what went wrong when the cluster failed, if the "network name" resources are what failed, unchecking "require dns registration to succeed" will fix that temporarily, but will need to fix the problem eventually.