Solved

Microsft Cluster - Resources stuck on online pending - HELP.......

Posted on 2008-10-29
16
11,563 Views
Last Modified: 2013-12-02
Hi all,

Newby here to Experts Exchange - Just signed up.  I am really hoping you guys can help me.  Let me expain the issue quickly.

I have a HP DL380 G4 Packaged Cluster solution acting as a Active/Active File & Print Cluster.  It runs WIN2K3 Enterprise R2, SP2 Edition.  Just recently, out of the blue, the 1st node will not accept any failover resources.  It hangs on any disk being failed over.  The only error I get is the below.

Event Type:      Warning
Event Source:      PlugPlayManager
Event Category:      None
Event ID:      256
Date:            28/10/2008
Time:            22:30:43
User:            N/A
Computer:      xxxxxxxxxxx (removed for privacy)
Description:
Timed out sending notification of device interface change to window of "ClusterDiskPnPWatcher"

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

The node basically hangs after this and the resources fail.  After Googling, Yahooing, MSNing and CUILing I get a bunch of useless information.  Does anyone have any ideas how to troubleshoot this.  Below are the steps I have already performed.

Evicted Bad node from cluster, manually "cleandup Node".  Rejoined Node.     NO CHANGE
Disabled MSDTC (This is setup as a cluster resource)                                      NO CHANGE
Disabled Mcafee (in case it was holding onto a open file on disks).                  NO CHANGE
Created a seperate hardware profile on bad node.                                            NO CHANGE

I would really appreciate any assistance in this.
0
Comment
Question by:Ireland18
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 8
16 Comments
 

Author Comment

by:Ireland18
ID: 22836263
UPDATE:  I have upgraded the smart array 6i controller driver on both noes - NO CHANGE
0
 
LVL 22

Expert Comment

by:65td
ID: 22840064
Will the default cluster group move to passive node?
0
 

Author Comment

by:Ireland18
ID: 22840308
No, not even the Quorom will move.  Any Group I try to move that has a Physical Disk Resource assigned to it just seems to hang at "Online Pending".  Crazy.
0
What Is Transaction Monitoring and who needs it?

Synthetic Transaction Monitoring that you need for the day to day, which ensures your business website keeps running optimally, and that there is no downtime to impact your customer experience.

 
LVL 22

Expert Comment

by:65td
ID: 22840327
System event log messages?
Review the cluster log on the active node under c:\windows\cluster - cluster.log
0
 

Author Comment

by:Ireland18
ID: 22840947
I have examined the logs of the failing node - have also googled all of the errors that seemed important and have gotten nowhere except for the troubleshooting steps above.  Here is the log, I have removed any reference to server names, etc..........
908:948.10/29[22:13:48.750](225936) WARN [JOIN] Attempting join with sponsor 10.10.10.11.
908:948.10/29[22:13:48.921](225936) WARN [ClNet] Tcpip is not bound to adapter 153C8860-8D27-4F45-B5BE-2AEDB5D14508.
908:948.10/29[22:13:48.921](225936) WARN [MM] MmQuorumArbitrationTimeout 60.
908:948.10/29[22:13:49.859](226077) WARN [NM] Cryptor: Data is not encrypted.
908:948.10/29[22:13:49.859](226077) WARN [NM] Cryptor received unencrypted data.
908:948.10/29[22:13:50.125](226077) WARN [DM] Obtained new database.
908:948.10/29[22:13:50.140](226077) WARN [DM] DmpSafeDatabaseCopy:: SetFileAttrib on BkpPath C:\WINDOWS\Cluster\CLUSDB.BKP$ failed, Status=2
f40:f5c.10/29[22:13:53.781](226089) ERR  IP Address <Print Mgmt - IP Address>: Unable to open node parameters key, status 2.
f40:f58.10/29[22:13:53.781](226089) WARN Network Name <Print Mgmt - Network Name>: Unable to read ResourceData parameter, error=2
f40:f58.10/29[22:13:53.781](226089) WARN Network Name <Print Mgmt - Network Name>: Unable to read CreatingDC parameter, error=2
f40:f5c.10/29[22:13:53.843](226089) WARN Network Name <Cluster Mgmt - Cluster Name>: Unable to read ResourceData parameter, error=2
f40:f5c.10/29[22:13:53.843](226089) WARN Network Name <Cluster Mgmt - Cluster Name>: Unable to read CreatingDC parameter, error=2
908:a18.10/29[22:13:55.671](226089) WARN [FM] FmDeleteResourceType: Resource type Microsoft Message Queue Server does not exist...
908:948.10/29[22:13:55.703](226089) WARN [EVT] Set propagation state to 0001
908:f60.10/29[22:13:57.656](226090) WARN [FM] FmDeleteResourceType: Resource type IIS Server Instance does not exist...
908:a2c.10/29[22:13:57.718](226090) WARN [FM] FmDeleteResourceType: Resource type SMTP Server Instance does not exist...
908:a18.10/29[22:13:57.765](226090) WARN [FM] FmDeleteResourceType: Resource type NNTP Server Instance does not exist...
908:f60.10/29[22:13:57.812](226090) WARN [FM] FmDeleteResourceType: Resource type IIS Virtual Root does not exist...
908:a2c.10/29[22:13:57.859](226090) WARN [FM] FmDeleteResourceType: Resource type Time Service does not exist...
f40:780.10/30[08:17:50.389](226358) WARN Physical Disk <Support Drive>: [DiskArb] Assume ownership of the device.
968:96c.10/30[08:20:45.531](226358) INFO [CS] Cluster Service started - Cluster Node Version 4.3790
968:9e0.10/30[08:20:47.500](226358) WARN [NM] Failed to open cluster parameters key, status 2.
968:9e0.10/30[08:20:47.593](226358) WARN [JOIN] Attempting join with sponsor 10.10.10.11.
968:9e0.10/30[08:20:47.750](226358) WARN [ClNet] Tcpip is not bound to adapter 153C8860-8D27-4F45-B5BE-2AEDB5D14508.
968:9e0.10/30[08:20:47.765](226358) WARN [MM] MmQuorumArbitrationTimeout 60.
968:9e0.10/30[08:20:48.656](226375) WARN [NM] Cryptor: Data is not encrypted.
968:9e0.10/30[08:20:48.656](226375) WARN [NM] Cryptor received unencrypted data.
968:9e0.10/30[08:20:48.937](226375) WARN [DM] Obtained new database.
968:9e0.10/30[08:20:48.937](226375) WARN [DM] DmpSafeDatabaseCopy:: SetFileAttrib on BkpPath C:\WINDOWS\Cluster\CLUSDB.BKP$ failed, Status=2
f34:f50.10/30[08:20:52.515](226387) ERR  IP Address <Print Mgmt - IP Address>: Unable to open node parameters key, status 2.
f34:f4c.10/30[08:20:52.515](226387) WARN Network Name <Print Mgmt - Network Name>: Unable to read ResourceData parameter, error=2
f34:f4c.10/30[08:20:52.515](226387) WARN Network Name <Print Mgmt - Network Name>: Unable to read CreatingDC parameter, error=2
f34:f50.10/30[08:20:52.562](226387) WARN Network Name <Cluster Mgmt - Cluster Name>: Unable to read ResourceData parameter, error=2
f34:f50.10/30[08:20:52.562](226387) WARN Network Name <Cluster Mgmt - Cluster Name>: Unable to read CreatingDC parameter, error=2
968:a68.10/30[08:20:54.109](226387) WARN [FM] FmDeleteResourceType: Resource type Microsoft Message Queue Server does not exist...
968:9e0.10/30[08:20:54.140](226387) WARN [EVT] Set propagation state to 0001
968:f54.10/30[08:20:56.171](226388) WARN [FM] FmDeleteResourceType: Resource type IIS Server Instance does not exist...
968:a7c.10/30[08:20:56.234](226388) WARN [FM] FmDeleteResourceType: Resource type SMTP Server Instance does not exist...
968:a68.10/30[08:20:56.281](226388) WARN [FM] FmDeleteResourceType: Resource type NNTP Server Instance does not exist...
968:f54.10/30[08:20:56.328](226388) WARN [FM] FmDeleteResourceType: Resource type IIS Virtual Root does not exist...
968:a7c.10/30[08:20:56.375](226388) WARN [FM] FmDeleteResourceType: Resource type Time Service does not exist...
f34:a70.10/30[10:16:00.152](226458) WARN Physical Disk <Support Drive>: [DiskArb] Assume ownership of the device.
f34:5bc.10/30[10:21:26.808](226476) WARN Physical Disk <Support Drive>: Offline, Locking volume failed, error 5.
f34:5bc.10/30[10:21:27.417](226477) WARN Physical Disk <Support Drive>: Offline, Locking volume failed, error 5.
f34:718.10/30[10:21:35.542](226508) WARN Physical Disk <Support Drive>: [DiskArb] Assume ownership of the device.
f34:194.10/30[10:24:35.545](226512) WARN [RM] RmpTimerThread: Resource Support Drive pending timed out, CP 0 - setting state to failed.
8d0:8d4.10/30[10:30:42.015](226514) INFO [CS] Cluster Service started - Cluster Node Version 4.3790
8d0:8e8.10/30[10:30:43.718](226514) WARN [NM] Failed to open cluster parameters key, status 2.
8d0:8e8.10/30[10:30:43.890](226514) WARN [JOIN] Attempting join with sponsor 10.10.10.11.
8d0:8e8.10/30[10:30:44.218](226514) WARN [ClNet] Tcpip is not bound to adapter 153C8860-8D27-4F45-B5BE-2AEDB5D14508.
8d0:8e8.10/30[10:30:44.234](226514) WARN [MM] MmQuorumArbitrationTimeout 60.
8d0:8e8.10/30[10:30:45.218](226531) WARN [NM] Cryptor: Data is not encrypted.
8d0:8e8.10/30[10:30:45.218](226531) WARN [NM] Cryptor received unencrypted data.
8d0:8e8.10/30[10:30:45.390](226531) WARN [DM] Obtained new database.
8d0:8e8.10/30[10:30:45.390](226531) WARN [DM] DmpSafeDatabaseCopy:: SetFileAttrib on BkpPath C:\WINDOWS\Cluster\CLUSDB.BKP$ failed, Status=2
fd0:fec.10/30[10:30:48.453](226543) ERR  IP Address <Print Mgmt - IP Address>: Unable to open node parameters key, status 2.
fd0:fe8.10/30[10:30:48.453](226543) WARN Network Name <Print Mgmt - Network Name>: Unable to read ResourceData parameter, error=2
fd0:fe8.10/30[10:30:48.453](226543) WARN Network Name <Print Mgmt - Network Name>: Unable to read CreatingDC parameter, error=2
fd0:fec.10/30[10:30:48.484](226543) WARN Network Name <Cluster Mgmt - Cluster Name>: Unable to read ResourceData parameter, error=2
fd0:fec.10/30[10:30:48.484](226543) WARN Network Name <Cluster Mgmt - Cluster Name>: Unable to read CreatingDC parameter, error=2
8d0:e40.10/30[10:30:50.187](226543) WARN [FM] FmDeleteResourceType: Resource type Microsoft Message Queue Server does not exist...
8d0:8e8.10/30[10:30:50.218](226543) WARN [EVT] Set propagation state to 0001
8d0:e40.10/30[10:30:52.406](226544) WARN [FM] FmDeleteResourceType: Resource type IIS Server Instance does not exist...
8d0:e4c.10/30[10:30:52.453](226544) WARN [FM] FmDeleteResourceType: Resource type SMTP Server Instance does not exist...
8d0:e40.10/30[10:30:52.500](226544) WARN [FM] FmDeleteResourceType: Resource type NNTP Server Instance does not exist...
8d0:e4c.10/30[10:30:52.546](226544) WARN [FM] FmDeleteResourceType: Resource type IIS Virtual Root does not exist...
8d0:e40.10/30[10:30:52.593](226544) WARN [FM] FmDeleteResourceType: Resource type Time Service does not exist...
0
 
LVL 22

Expert Comment

by:65td
ID: 22841891
Review NTFS permissions on the <Support Drive>.
0
 

Author Comment

by:Ireland18
ID: 22842220
OK - So I have added the cluster service account (which is already a memeber of local administrators) directly into the root NTFS permissions on the Support Drive.  I then tested a failover and the same applies - stuck in "online pending"
0
 
LVL 22

Expert Comment

by:65td
ID: 22842476
Does the cluster service account have full control all the way through the disk (from the root)?
Review the cluster configure (properties) Q drive is not local, correct?

Lots of file "The system cannot find the file specified" and access denied in the cluster log.
0
 

Author Comment

by:Ireland18
ID: 22844582
Yes, so originally the cluster service is a member of the local admins which had full access through every drive from the root.  I have now added that account explicitly to the root of the Qurom and Support drives with no changes in problem when failing over.
I have the drives configured as below:
D:  Data - 1TB
E:  Print - 50GB
F:  Support -  50GB  -  Applications are installed here.
Q:  Quorom  - 10GB
 
0
 
LVL 22

Expert Comment

by:65td
ID: 22849338
"Evicted Bad node from cluster, manually "cleandup Node".  Rejoined Node. "
Manually cleanedup node?
Did you use from the cmd prompt cluster /node [node-name] /FORCE
0
 

Author Comment

by:Ireland18
ID: 22849717
Yep - I have already done the below:
Eviceted Node:  C:\ cluster node xxxxxxxxxx /force cleanup
removed node from domain, readded to domain and rejoined to cluster.
0
 
LVL 22

Expert Comment

by:65td
ID: 22851404
The node came in OK, review the clcfgsrv.log.  Under windows\system32\logfiles\cluster\

The bad node should at least take the default group.

Is the node paused?
0
 

Author Comment

by:Ireland18
ID: 22851511
Yea, it wont take the default group or any group for that matter.
The node is not paused - I just double checked.  To be honest, the log doesnt make much sense to me and is quite large - I will take some time to review this now.  Thanks again for your continued help.
I would post the log but it contains a lot of references to the company name, IP address's and server names.  I got cought out on posting such material before with not so good of an outcome!!!!!
0
 
LVL 22

Expert Comment

by:65td
ID: 22851579
That's fine, I suggest start from the end of the log and work back.
0
 

Author Comment

by:Ireland18
ID: 22851587
I just noticed a lot of these in the clcfgsrv.log.  Do they mean anything to you?
 
PC-PreCreate] PHYSICALDRIVE3: Resource does not want to be managed. Skipping. (hr=0x000001, {CD36919C-9F31-46B4-A29D-AC87F4E6CC93}, {9DAA8CDA-1004-4543-BCFF-4ECF774AA8A7}, 0, 1, 1), (null)
2008-10-28 13:17:57.681 [WARN] (Server Name): [SRV] Enumerating resources. Total Requested:1; Current enum index:1; Total Enums:4. (hr=0x000001, {05AA0768-5F49-49CD-AFDC-96F9D51802D4}, {00000000-0000-0000-0000-000000000000}, 1, 1, 1), (null)
2008-10-28 13:17:57.759 [WARN] (Server Name): [PC-PreCreate] Disk F:: Resource does not want to be managed. Skipping. (hr=0x000001, {CD36919C-9F31-46B4-A29D-AC87F4E6CC93}, {9DAA8CDA-1004-4543-BCFF-4ECF774AA8A7}, 0, 1, 1), (null)
2008-10-28 13:17:58.709 [WARN] (Server Name): [SRV] Enumerating resources. Total Requested:1; Current enum index:1; Total Enums:4. (hr=0x000001, {05AA0768-5F49-49CD-AFDC-96F9D51802D4}, {00000000-0000-0000-0000-000000000000}, 1, 1, 1), (null)
2008-10-28 13:17:58.771 [WARN] (Server Name): [PC-PreCreate] Majority Node Set: Resource does not want to be managed. Skipping. (hr=0x000001, {CD36919C-9F31-46B4-A29D-AC87F4E6CC93}, {9DAA8CDA-1004-4543-BCFF-4ECF774AA8A7}, 0, 1, 1), (null)  
0
 
LVL 22

Accepted Solution

by:
65td earned 500 total points
ID: 22852078
What type of quorum is be used - shared on san on local or Majority Node Set?
Could evict node again, rename clcfgsrv.log and bring the node in again (after performing the cluster /node [node-name] /FORCE and restarting).
0

Featured Post

Transaction Monitoring Vs. Real User Monitoring

Synthetic Transaction Monitoring Vs. Real User Monitoring: When To Use Each Approach? In this article, we will discuss two major monitoring approaches: Synthetic Transaction and Real User Monitoring.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Usually shares are where we want them for our users and we tend to take them for granted. There are times, however, when those shares may disappear causing difficulty for your users. One of the first things to try is searching for files that shou…
This article provides a convenient collection of links to Microsoft provided Security Patches for operating systems that have reached their End of Life support cycle. Included operating systems covered by this article are Windows XP,  Windows Server…
There are cases when e.g. an IT administrator wants to have full access and view into selected mailboxes on Exchange server, directly from his own email account in Outlook or Outlook Web Access. This proves useful when for example administrator want…
Michael from AdRem Software outlines event notifications and Automatic Corrective Actions in network monitoring. Automatic Corrective Actions are scripts, which can automatically run upon discovery of a certain undesirable condition in your network.…

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question