Solved

Microsft Cluster - Resources stuck on online pending - HELP.......

Posted on 2008-10-29
16
10,843 Views
Last Modified: 2013-12-02
Hi all,

Newby here to Experts Exchange - Just signed up.  I am really hoping you guys can help me.  Let me expain the issue quickly.

I have a HP DL380 G4 Packaged Cluster solution acting as a Active/Active File & Print Cluster.  It runs WIN2K3 Enterprise R2, SP2 Edition.  Just recently, out of the blue, the 1st node will not accept any failover resources.  It hangs on any disk being failed over.  The only error I get is the below.

Event Type:      Warning
Event Source:      PlugPlayManager
Event Category:      None
Event ID:      256
Date:            28/10/2008
Time:            22:30:43
User:            N/A
Computer:      xxxxxxxxxxx (removed for privacy)
Description:
Timed out sending notification of device interface change to window of "ClusterDiskPnPWatcher"

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

The node basically hangs after this and the resources fail.  After Googling, Yahooing, MSNing and CUILing I get a bunch of useless information.  Does anyone have any ideas how to troubleshoot this.  Below are the steps I have already performed.

Evicted Bad node from cluster, manually "cleandup Node".  Rejoined Node.     NO CHANGE
Disabled MSDTC (This is setup as a cluster resource)                                      NO CHANGE
Disabled Mcafee (in case it was holding onto a open file on disks).                  NO CHANGE
Created a seperate hardware profile on bad node.                                            NO CHANGE

I would really appreciate any assistance in this.
0
Comment
Question by:Ireland18
  • 8
  • 8
16 Comments
 

Author Comment

by:Ireland18
ID: 22836263
UPDATE:  I have upgraded the smart array 6i controller driver on both noes - NO CHANGE
0
 
LVL 22

Expert Comment

by:65td
ID: 22840064
Will the default cluster group move to passive node?
0
 

Author Comment

by:Ireland18
ID: 22840308
No, not even the Quorom will move.  Any Group I try to move that has a Physical Disk Resource assigned to it just seems to hang at "Online Pending".  Crazy.
0
 
LVL 22

Expert Comment

by:65td
ID: 22840327
System event log messages?
Review the cluster log on the active node under c:\windows\cluster - cluster.log
0
 

Author Comment

by:Ireland18
ID: 22840947
I have examined the logs of the failing node - have also googled all of the errors that seemed important and have gotten nowhere except for the troubleshooting steps above.  Here is the log, I have removed any reference to server names, etc..........
908:948.10/29[22:13:48.750](225936) WARN [JOIN] Attempting join with sponsor 10.10.10.11.
908:948.10/29[22:13:48.921](225936) WARN [ClNet] Tcpip is not bound to adapter 153C8860-8D27-4F45-B5BE-2AEDB5D14508.
908:948.10/29[22:13:48.921](225936) WARN [MM] MmQuorumArbitrationTimeout 60.
908:948.10/29[22:13:49.859](226077) WARN [NM] Cryptor: Data is not encrypted.
908:948.10/29[22:13:49.859](226077) WARN [NM] Cryptor received unencrypted data.
908:948.10/29[22:13:50.125](226077) WARN [DM] Obtained new database.
908:948.10/29[22:13:50.140](226077) WARN [DM] DmpSafeDatabaseCopy:: SetFileAttrib on BkpPath C:\WINDOWS\Cluster\CLUSDB.BKP$ failed, Status=2
f40:f5c.10/29[22:13:53.781](226089) ERR  IP Address <Print Mgmt - IP Address>: Unable to open node parameters key, status 2.
f40:f58.10/29[22:13:53.781](226089) WARN Network Name <Print Mgmt - Network Name>: Unable to read ResourceData parameter, error=2
f40:f58.10/29[22:13:53.781](226089) WARN Network Name <Print Mgmt - Network Name>: Unable to read CreatingDC parameter, error=2
f40:f5c.10/29[22:13:53.843](226089) WARN Network Name <Cluster Mgmt - Cluster Name>: Unable to read ResourceData parameter, error=2
f40:f5c.10/29[22:13:53.843](226089) WARN Network Name <Cluster Mgmt - Cluster Name>: Unable to read CreatingDC parameter, error=2
908:a18.10/29[22:13:55.671](226089) WARN [FM] FmDeleteResourceType: Resource type Microsoft Message Queue Server does not exist...
908:948.10/29[22:13:55.703](226089) WARN [EVT] Set propagation state to 0001
908:f60.10/29[22:13:57.656](226090) WARN [FM] FmDeleteResourceType: Resource type IIS Server Instance does not exist...
908:a2c.10/29[22:13:57.718](226090) WARN [FM] FmDeleteResourceType: Resource type SMTP Server Instance does not exist...
908:a18.10/29[22:13:57.765](226090) WARN [FM] FmDeleteResourceType: Resource type NNTP Server Instance does not exist...
908:f60.10/29[22:13:57.812](226090) WARN [FM] FmDeleteResourceType: Resource type IIS Virtual Root does not exist...
908:a2c.10/29[22:13:57.859](226090) WARN [FM] FmDeleteResourceType: Resource type Time Service does not exist...
f40:780.10/30[08:17:50.389](226358) WARN Physical Disk <Support Drive>: [DiskArb] Assume ownership of the device.
968:96c.10/30[08:20:45.531](226358) INFO [CS] Cluster Service started - Cluster Node Version 4.3790
968:9e0.10/30[08:20:47.500](226358) WARN [NM] Failed to open cluster parameters key, status 2.
968:9e0.10/30[08:20:47.593](226358) WARN [JOIN] Attempting join with sponsor 10.10.10.11.
968:9e0.10/30[08:20:47.750](226358) WARN [ClNet] Tcpip is not bound to adapter 153C8860-8D27-4F45-B5BE-2AEDB5D14508.
968:9e0.10/30[08:20:47.765](226358) WARN [MM] MmQuorumArbitrationTimeout 60.
968:9e0.10/30[08:20:48.656](226375) WARN [NM] Cryptor: Data is not encrypted.
968:9e0.10/30[08:20:48.656](226375) WARN [NM] Cryptor received unencrypted data.
968:9e0.10/30[08:20:48.937](226375) WARN [DM] Obtained new database.
968:9e0.10/30[08:20:48.937](226375) WARN [DM] DmpSafeDatabaseCopy:: SetFileAttrib on BkpPath C:\WINDOWS\Cluster\CLUSDB.BKP$ failed, Status=2
f34:f50.10/30[08:20:52.515](226387) ERR  IP Address <Print Mgmt - IP Address>: Unable to open node parameters key, status 2.
f34:f4c.10/30[08:20:52.515](226387) WARN Network Name <Print Mgmt - Network Name>: Unable to read ResourceData parameter, error=2
f34:f4c.10/30[08:20:52.515](226387) WARN Network Name <Print Mgmt - Network Name>: Unable to read CreatingDC parameter, error=2
f34:f50.10/30[08:20:52.562](226387) WARN Network Name <Cluster Mgmt - Cluster Name>: Unable to read ResourceData parameter, error=2
f34:f50.10/30[08:20:52.562](226387) WARN Network Name <Cluster Mgmt - Cluster Name>: Unable to read CreatingDC parameter, error=2
968:a68.10/30[08:20:54.109](226387) WARN [FM] FmDeleteResourceType: Resource type Microsoft Message Queue Server does not exist...
968:9e0.10/30[08:20:54.140](226387) WARN [EVT] Set propagation state to 0001
968:f54.10/30[08:20:56.171](226388) WARN [FM] FmDeleteResourceType: Resource type IIS Server Instance does not exist...
968:a7c.10/30[08:20:56.234](226388) WARN [FM] FmDeleteResourceType: Resource type SMTP Server Instance does not exist...
968:a68.10/30[08:20:56.281](226388) WARN [FM] FmDeleteResourceType: Resource type NNTP Server Instance does not exist...
968:f54.10/30[08:20:56.328](226388) WARN [FM] FmDeleteResourceType: Resource type IIS Virtual Root does not exist...
968:a7c.10/30[08:20:56.375](226388) WARN [FM] FmDeleteResourceType: Resource type Time Service does not exist...
f34:a70.10/30[10:16:00.152](226458) WARN Physical Disk <Support Drive>: [DiskArb] Assume ownership of the device.
f34:5bc.10/30[10:21:26.808](226476) WARN Physical Disk <Support Drive>: Offline, Locking volume failed, error 5.
f34:5bc.10/30[10:21:27.417](226477) WARN Physical Disk <Support Drive>: Offline, Locking volume failed, error 5.
f34:718.10/30[10:21:35.542](226508) WARN Physical Disk <Support Drive>: [DiskArb] Assume ownership of the device.
f34:194.10/30[10:24:35.545](226512) WARN [RM] RmpTimerThread: Resource Support Drive pending timed out, CP 0 - setting state to failed.
8d0:8d4.10/30[10:30:42.015](226514) INFO [CS] Cluster Service started - Cluster Node Version 4.3790
8d0:8e8.10/30[10:30:43.718](226514) WARN [NM] Failed to open cluster parameters key, status 2.
8d0:8e8.10/30[10:30:43.890](226514) WARN [JOIN] Attempting join with sponsor 10.10.10.11.
8d0:8e8.10/30[10:30:44.218](226514) WARN [ClNet] Tcpip is not bound to adapter 153C8860-8D27-4F45-B5BE-2AEDB5D14508.
8d0:8e8.10/30[10:30:44.234](226514) WARN [MM] MmQuorumArbitrationTimeout 60.
8d0:8e8.10/30[10:30:45.218](226531) WARN [NM] Cryptor: Data is not encrypted.
8d0:8e8.10/30[10:30:45.218](226531) WARN [NM] Cryptor received unencrypted data.
8d0:8e8.10/30[10:30:45.390](226531) WARN [DM] Obtained new database.
8d0:8e8.10/30[10:30:45.390](226531) WARN [DM] DmpSafeDatabaseCopy:: SetFileAttrib on BkpPath C:\WINDOWS\Cluster\CLUSDB.BKP$ failed, Status=2
fd0:fec.10/30[10:30:48.453](226543) ERR  IP Address <Print Mgmt - IP Address>: Unable to open node parameters key, status 2.
fd0:fe8.10/30[10:30:48.453](226543) WARN Network Name <Print Mgmt - Network Name>: Unable to read ResourceData parameter, error=2
fd0:fe8.10/30[10:30:48.453](226543) WARN Network Name <Print Mgmt - Network Name>: Unable to read CreatingDC parameter, error=2
fd0:fec.10/30[10:30:48.484](226543) WARN Network Name <Cluster Mgmt - Cluster Name>: Unable to read ResourceData parameter, error=2
fd0:fec.10/30[10:30:48.484](226543) WARN Network Name <Cluster Mgmt - Cluster Name>: Unable to read CreatingDC parameter, error=2
8d0:e40.10/30[10:30:50.187](226543) WARN [FM] FmDeleteResourceType: Resource type Microsoft Message Queue Server does not exist...
8d0:8e8.10/30[10:30:50.218](226543) WARN [EVT] Set propagation state to 0001
8d0:e40.10/30[10:30:52.406](226544) WARN [FM] FmDeleteResourceType: Resource type IIS Server Instance does not exist...
8d0:e4c.10/30[10:30:52.453](226544) WARN [FM] FmDeleteResourceType: Resource type SMTP Server Instance does not exist...
8d0:e40.10/30[10:30:52.500](226544) WARN [FM] FmDeleteResourceType: Resource type NNTP Server Instance does not exist...
8d0:e4c.10/30[10:30:52.546](226544) WARN [FM] FmDeleteResourceType: Resource type IIS Virtual Root does not exist...
8d0:e40.10/30[10:30:52.593](226544) WARN [FM] FmDeleteResourceType: Resource type Time Service does not exist...
0
 
LVL 22

Expert Comment

by:65td
ID: 22841891
Review NTFS permissions on the <Support Drive>.
0
 

Author Comment

by:Ireland18
ID: 22842220
OK - So I have added the cluster service account (which is already a memeber of local administrators) directly into the root NTFS permissions on the Support Drive.  I then tested a failover and the same applies - stuck in "online pending"
0
 
LVL 22

Expert Comment

by:65td
ID: 22842476
Does the cluster service account have full control all the way through the disk (from the root)?
Review the cluster configure (properties) Q drive is not local, correct?

Lots of file "The system cannot find the file specified" and access denied in the cluster log.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 

Author Comment

by:Ireland18
ID: 22844582
Yes, so originally the cluster service is a member of the local admins which had full access through every drive from the root.  I have now added that account explicitly to the root of the Qurom and Support drives with no changes in problem when failing over.
I have the drives configured as below:
D:  Data - 1TB
E:  Print - 50GB
F:  Support -  50GB  -  Applications are installed here.
Q:  Quorom  - 10GB
 
0
 
LVL 22

Expert Comment

by:65td
ID: 22849338
"Evicted Bad node from cluster, manually "cleandup Node".  Rejoined Node. "
Manually cleanedup node?
Did you use from the cmd prompt cluster /node [node-name] /FORCE
0
 

Author Comment

by:Ireland18
ID: 22849717
Yep - I have already done the below:
Eviceted Node:  C:\ cluster node xxxxxxxxxx /force cleanup
removed node from domain, readded to domain and rejoined to cluster.
0
 
LVL 22

Expert Comment

by:65td
ID: 22851404
The node came in OK, review the clcfgsrv.log.  Under windows\system32\logfiles\cluster\

The bad node should at least take the default group.

Is the node paused?
0
 

Author Comment

by:Ireland18
ID: 22851511
Yea, it wont take the default group or any group for that matter.
The node is not paused - I just double checked.  To be honest, the log doesnt make much sense to me and is quite large - I will take some time to review this now.  Thanks again for your continued help.
I would post the log but it contains a lot of references to the company name, IP address's and server names.  I got cought out on posting such material before with not so good of an outcome!!!!!
0
 
LVL 22

Expert Comment

by:65td
ID: 22851579
That's fine, I suggest start from the end of the log and work back.
0
 

Author Comment

by:Ireland18
ID: 22851587
I just noticed a lot of these in the clcfgsrv.log.  Do they mean anything to you?
 
PC-PreCreate] PHYSICALDRIVE3: Resource does not want to be managed. Skipping. (hr=0x000001, {CD36919C-9F31-46B4-A29D-AC87F4E6CC93}, {9DAA8CDA-1004-4543-BCFF-4ECF774AA8A7}, 0, 1, 1), (null)
2008-10-28 13:17:57.681 [WARN] (Server Name): [SRV] Enumerating resources. Total Requested:1; Current enum index:1; Total Enums:4. (hr=0x000001, {05AA0768-5F49-49CD-AFDC-96F9D51802D4}, {00000000-0000-0000-0000-000000000000}, 1, 1, 1), (null)
2008-10-28 13:17:57.759 [WARN] (Server Name): [PC-PreCreate] Disk F:: Resource does not want to be managed. Skipping. (hr=0x000001, {CD36919C-9F31-46B4-A29D-AC87F4E6CC93}, {9DAA8CDA-1004-4543-BCFF-4ECF774AA8A7}, 0, 1, 1), (null)
2008-10-28 13:17:58.709 [WARN] (Server Name): [SRV] Enumerating resources. Total Requested:1; Current enum index:1; Total Enums:4. (hr=0x000001, {05AA0768-5F49-49CD-AFDC-96F9D51802D4}, {00000000-0000-0000-0000-000000000000}, 1, 1, 1), (null)
2008-10-28 13:17:58.771 [WARN] (Server Name): [PC-PreCreate] Majority Node Set: Resource does not want to be managed. Skipping. (hr=0x000001, {CD36919C-9F31-46B4-A29D-AC87F4E6CC93}, {9DAA8CDA-1004-4543-BCFF-4ECF774AA8A7}, 0, 1, 1), (null)  
0
 
LVL 22

Accepted Solution

by:
65td earned 500 total points
ID: 22852078
What type of quorum is be used - shared on san on local or Majority Node Set?
Could evict node again, rename clcfgsrv.log and bring the node in again (after performing the cluster /node [node-name] /FORCE and restarting).
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Learn about cloud computing and its benefits for small business owners.
Moving your enterprise fax infrastructure from in-house fax machines and servers to the cloud makes sense — from both an efficiency and productivity standpoint. But does migrating to a cloud fax solution mean you will no longer be able to send or re…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now