asked on

Can not add SQL Cluster Node 2 after evicting it

We had an issue with our Active/Passive SQL Cluster 2005 and had to reboot them. The issue was related to our disks on SAN. when both the nodes came online and disks were showing up online on Node 1 we were able to bring the cluster service on and the resources online on Node 1 but the Node 2 failed to restart the service. We thought of evicting the node and try to rejoin it. It would let us do that and give this error message constantly even though the service is not started.

00000ba0.00000b78::2011/07/12-20:29:11.360 INFO [CS] Cluster Service started - Cluster Node Version 4.3790
00000ba0.00000b78::2011/07/12-20:29:11.360 INFO OS Version 5.2.3790 - Service Pack 2 (ADS 03000112L)
00000ba0.00000b78::2011/07/12-20:29:11.360 INFO Local Time is 2011/07/12-15:29:11.360
00000ba0.00000db0::2011/07/12-20:29:11.360 INFO [CS] Service Starting...
00000ba0.00000db0::2011/07/12-20:29:11.360 INFO [INIT] ClusterInitialize called to start cluster.
00000ba0.00000db0::2011/07/12-20:29:11.360 INFO [EP] Initialization...
00000ba0.00000db0::2011/07/12-20:29:11.360 INFO [DM] Initialization
00000ba0.00000db0::2011/07/12-20:29:11.360 ERR [DM] DmInitialize: The hive was loaded- rollback, unload and reload again
00000ba0.00000db0::2011/07/12-20:29:11.360 INFO [DM] DmpRestartFlusher: Entry
00000ba0.00000db0::2011/07/12-20:29:11.360 INFO [DM] DmpUnloadHive: unloading the hive
00000ba0.00000db0::2011/07/12-20:29:11.360 INFO [Qfs] QfsSetFileAttributes C:\WINDOWS\Cluster\CLUSDB.BKP$ 80, status 2
00000ba0.00000db0::2011/07/12-20:29:11.360 INFO [Qfs] QfsDeleteFile C:\WINDOWS\Cluster\CLUSDB.BKP$, status 2
00000ba0.00000db0::2011/07/12-20:29:11.360 INFO [DM] Loading cluster database from C:\WINDOWS\Cluster\CLUSDB
00000ba0.00000db0::2011/07/12-20:29:11.360 INFO [DM] DmpStartFlusher: Entry
00000ba0.00000db0::2011/07/12-20:29:11.360 INFO [DM] DmpStartFlusher: thread created
00000ba0.00000db0::2011/07/12-20:29:11.360 ERR [DM] Failed to open key Resources, status 2
00000ba0.00000db0::2011/07/12-20:29:11.376 ERR Cluster service suffered an unexpected fatal error at line 1386 of source module d:\nt\base\cluster\service\dm\dminit.c. The error code was 2.

Do anyone have any idea about what could be causing this?

gtworek

Did you try to force cleanup on this node?
"cluster node /force" on the node you'd like to clean. Be aware that will destroy cluster information on this node so do not run it on "live" node ;)
http://technet.microsoft.com/en-us/library/cc739895(WS.10).aspx

soumyaghosh

ASKER

thanks for the link, i will try this out and let you know.

Delphineous Silverwing

We had this issue when using Symantec Endpoint Protection on our SQL cluster. If you are using SEP, there is an update available from Symantec.

soumyaghosh

ASKER

also i keep getting this error in event viewer despite the cluster service is not running

Cluster service suffered an unexpected fatal error at line 1386 of source module d:\nt\base\cluster\service\dm\dminit.c. The error code was 2.

ASKER CERTIFIED SOLUTION

gtworek

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

soumyaghosh

ASKER

i did try to reset quorum log , did not help. :(

gtworek

It is pretty common recommendation but as I expected it will not help in your case.
BTW dminit is the part responsible for initialization of Cluster Database Manager. It tries to open "Resources" registry key and then fails.

soumyaghosh

ASKER

Still the issue is not resolved. This answer can be considered as a suggestion not the resolution.