Link to home
Start Free TrialLog in
Avatar of Göran Andersson
Göran AnderssonFlag for Sweden

asked on

Shared disks not showing on clustered server

We have a clustered server, a Compaq Proliant CL380. We had some problems with the shared disks, so we changed the SCSI controler board for one identical from another server.

When restarting the server, the shared disks does not show up either as drives in the exporer, nor in the disk management. Not on any of the nodes.

They do show up in the Array Configuration Utility, though. As they do, they should also be available in the disk management...

Anyone got a lead on this?
Avatar of ewtaylor
ewtaylor

I have had some issues with this also, how are they shared out? If they are shared out the normal way then when the cluster rolls (nodes switch) the shares will be lost. The shares must be created through cluster manager. I found this out the hard way with user shares, after having to recreate em several times.
Avatar of Göran Andersson

ASKER

Thanks for the prompt response.

I didn't get any closer to the solution, though. This server has been running in this setup for over a year, and we have switched nodes many times without problems.
I take it you are running win2k? What version of cluster services are you running? Have you made sure all the cluster resources are running on a single node? I would shut down one side while you troubleshoot.
Yes, I'm running win2k, hence the post in the win2k area... ;)

The version of the Cluster Administrator is 5.0 The version of the Compaq Manamgement tools is rev. 4.90. Does that tell you what you wanted to know?

The resources are not working from any node. If I could reach the disks from any node, I would be happy... I have tried starting the server with a single node or both nodes.
You would be surprised how many non win2k questions are posted here, safer to just ask. That looks like the most recent version of MSCS It sounds like a hardware issue, have you checked all the dip switches, jumpers and terminators?
If you swapped out the controller, it won't know about the disks. I think you have to save the config (to a floppy or something), install the new controller, and then load the old config.
No, ewtaylor, I havent checked all the switches and such actually... but we have two almost identical clustered servers (both are Proliant CL380, but differs in production date) that we are currently trying the cluster controler (CR3500) in. I can't imagine than any switches suddenly has changed by themselves... and not on four nodes at the same time...

We have two servers, so we have two CR3500 units. When we put one of them in either of the servers, it fails to recognise the disks correctly. Warning light flashing. When we put the other in either server, it seams to recognise the disks correctly. So it seams we have one CR3500 that doesn't works, and one that does. But none of the nodes in either of the servers are able to use the disks with either of the controlers...

darth_wannabe, we have done that in a different way. We have run the erase disk on the node (with all hard drives removed) to clear the nvram, and then run the system setup disks to reconfigure the hardware setup from scratch. That's what the support said we should do... Is that wrong?

The Array Configuration Utility knows about the disks, and shows all the information correctly, but the disks doesn't show up in the disk management. So the information about the shared disks are available to the nodes, but they fail to recognise them as disks...

As both servers have the same problem, maybe the controler that seams to work also is faulty in some way? As we have two complete servers except for the controler, this is the only common part... Any thoughts?
Anything in the event viewer? Maybe an event id 1034?
I found no 1034 in any even log. I get event id:s 9, 1009 and 7031 in the system log when the cluster service fails to start. Wan't details on those? (I'll browse from the cluster then, so I can copy from the logs...)
Not sure which applies here, the first says an event id 9 is due to timeout on the scsi controller. http://support.microsoft.com/?kbid=259237

This link talks about authentication problems which does not seem likely. http://support.microsoft.com/?kbid=272129
I think that the timeout is because the disks cannot be accessed at all by windows...

The other article does not apply exactly to my situation, as the cluster service can't start on any of the nodes. I'll look a bit closer on what it has to say about permissions, though.
You might want to get compaq to send you a new controller and test that. After that I have used up all my cluster knowledge, sorry I could not be of help. Please keep us updated.
Right, forgot to close this one... Sorry.

This is what has happened:

We talked to Compaq support, and they said that there was no way of restoring the data. The information needed was stored in the controller that does not work any more.

So, we experimented a bit, as we thought there was nothing to lose. We put the disks in another server, and configured them. They worked fine, but there was no data on them.

Then we put them back in the original server, to reinstall it. When we had installed the operating system on one node, and started to configure the shared disks, they show up as already configured, and the data is intact. There were a few bad clusters that we had to repair, but other that that, we managed to copy all the data we wanted from the disks before formatting them again.

So, more or less by chance, we managed to do something that even Compaq says is impossible.

I don't know how to reward points for this question, though. I'd like to reward points for the help given on the way, but there is not really any "correct answer" to accept...
Thanks for the update, good to know...have them paq and refund this question for you
ASKER CERTIFIED SOLUTION
Avatar of PashaMod
PashaMod

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial