Windows 2003 Server Software RAID 1 SCSI Drive Replacement
I'm nervous. I have to replace a SCSI drive today that is in a dynamic disk set that is mirrored (see attached Disk Management attachment) in a Windows 2003 SBS server, and it's the boot disk of the set. The mirror is not broken, it is in a "Failed Sync" state, because of bad sectors on Disk 0.
I'm worried about making a mistake and this becoming a bigger, more time-consuming process than I want. I've read all kinds of horror stories and see all kinds of pitfalls. (see graphic "Device Manager" for SCSI information).
Of course I'm going to do a backup first. I'm planning on using Drive Snapshot, which I know can and will back up a dynamic disk.
In my research I picked up bits and pieces here and there, so this is how I plan to proceed, as a result. Please tell me if any of you see any flaws in this:
1. Do a backup. Verify backup.
2. Right-click on the two dynamic disk drive letters (C & E) and click "Remove Mirror" on the drive that has problems. I can't click on Disk0 itself, as that option is not available there (see graphic Can't Remove Mirror from... and Can Remove Mirror from...). This is worrisome, because I've read where others report that only one of the drive letters comes back and not both.
3. Shutdown.
4. Remove SCSI Disk 0.
5. Move SCSI Disk 1 to same cable position as SCSI Disk 0 was.
5. Connect new drive (same exact model) to cable position where SCSI Disk 1 was.
6. Reboot the server. I'm worried about this step and the server not rebooting from Disk1, which was not the boot disk of the set.
7. Open Disk Management. 2. Right-click the mirror on the missing disk, and then click Remove Mirror. Will this be necessary since I already removed the mirror from Disk 0 in step 2, above?
8. Initialize the new drive so it has unallocated space.
9. Right-click the drive that was previously mirrored to "Add Mirror".
10.Choose the new drive as the drive to add to the mirror.
11.Hopefully the status will be "Resyncing"! Disk-Managment.png Device-Manager.png Can-t-Remove-Mirror-from-Disk0.png Can-Remove-Mirror-From-Individua.png
on a 2003 machine usually the easiest way is to swap the position ( get a full back up before you do it).
back up backup backup.
Pull the bad drive.
boot to an install disk (repair console)
type: FIXMBR
reboot with DISK1 as the boot disk,
than rebuild the mirror. put a sticky note on the drive house to indicate the position change. so if someone other than you has to fix it later, they know the order is reversed.
typically back up and smooth sailing in about 20 minutes.
timinoldsmar
ASKER
Interesting. I guess something like this is in case things do go south. There's no mention of a dynamic disk anywhere in this kb. It is my understanding that breaking a mirror on a dynamic disk set makes gaining access to the data impossible afterward. This kb therefore is talking about a non-dynamic mirrored disk set, and the talk about making and booting from floppies is archaic and the supported systems for the kb are for NT only.
Okay, so I guess it is agreed that without doing something, the system is not going to reboot from disk1 without something happening. Is that a given with SCSI disks in a software dynamic mirrored set?
According to what you are saying, the difference between the boot drive and the non-boot drive in the set is the MBR. Once that is fixed, it will boot from the formerly non-boot disk.
If that one becomes the boot disk of the set, then why would I have to post anything on the drive house saying so? Is that because cable positioning is meaningless in this situation?
R. Andrew Koffron
I personally wouldn't change the cable position, as I've never done it. RAID controllers write info on disks, so I assume a software raid would also. I would just try and make the minimum changes. so I'd put a not in there so a future tech would know. (something I'd like to see before I made a decision thinking things where default) I'd rather give someone too much info than not enough. the real reason is that I'm not 100% what cable position does to effect a software raid, and I'd just rather be safe than sorry.
timinoldsmar
ASKER
Okay, you see my dilemma then. So many uncertainties. I can try to boot without changing cable positions just to see if it works, as I am with you on with keeping changes to a minimum.
Wish this was just a hot-swap hardware mirror. So much easier!
I need you other experts to weigh in on any and all potential issues - especially the ones that I'm nervous about.
Changing positions on a parallel SCSI cable does not change the address, that's set with jumpers on the disk's PCB. There's normally a label on the disk with the SCSI IDs in binary (something like ID 1 2 4)
timinoldsmar
ASKER
Thank you. I think we've definitely concluded that changing cable positions is an exercise in futility. I guess the reason I thought that was something I read where somebody with a similar task said going into the SCSI firmware utility and changing the SCSI ID or the drive that was designated the boot drive made it so he could boot his server from the disk in the mirrored array that was not the boot disk.
This made sense to me, because what does the SCSI controller know about a Windows software RAID1? It just has to pick the disk to get things started. And, from that I guessed that maybe changing the cable positionings would accomplish the same thing.
andyalder
Some controllers will only boot from the lowest disk ID present, but that's set by jumpers.
It is advisable to edit boot.ini to add a few options for different SCSI ID's before breaking the mirror just so they're available if you do need them. For example if you look at http://support.microsoft.com/kb/323427 you'll see multi(0)disk(0)rdisk(0)partition(1) in the example so there's no way that one could be told to boot from the second disk even if the controller was set to boot the second disk because Windows would only have the option to load the OS from the first one. You don't have to change the default, it's just useful to have a few alternates.
Thank you for everybody that participated. Unfortunately (or maybe fortunately for me!) I didn't have to go through with this on that old server. After I took a closer look at the drives themselves, one was a hot-swappable drive with an adapter connected to it to enable it to be installed in a non-hot swappable server (power, SCSI connector, and jumper to set the SCSI ID). This is the good drive. The replacement drive is the same make and model, but I did not have another adapter, so it could replace the bad drive that is a regular non-hot swappable drive.
I therefore talked the owner out of replacing the drive with bad sectors, since it was still online (just had some bad blocks that didn't make it perfectly synchronized with the other drive) AFTER he told me he was planning on replacing the server anyway in four to six weeks.
I don't know for sure what would have worked to get the dymanic disk to boot from the other drive. I did disconnect the bad drive, which I know was the boot drive of the set, just to see if it would boot from the good drive, but no surprises there, it wouldn't.
This is the boot.ini file:
[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(1)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(1)partition(1)\WINDOWS="Boot Mirror C: - secondary plex" /NoExecute=OptOut
multi(0)disk(0)rdisk(0)partition(2)\WINDOWS="Windows Server 2003 for Small Business Server" /fastdetect
Maybe just making some modifications here would have done the trick. I don't know. I was equally divided on whether it would be this or resetting a SCSI ID on the good drive. I'm leaning to it being the boot.ini file, since this is a software RAID1.
If anybody wants to weigh in before I accept a solution, or multiple solutions, please do so, and afterward I will make a pick or picks.
Secondary Plex was what I was using to boot before the fun started. I should have mentioned that. The other one wasn't working at all.
Also, I forgot to mention the third possibility, and that was modifying the MBR. I didn't absolutely rule it out. I just thought the boot.ini is the most likely with the SCSI ID being the second most likely.
timinoldsmar
ASKER
Plus there were other questions in the original post that nobody attempted to answer.
LOL, you did fail to tell us you were already booting the OS off the second disk.
timinoldsmar
ASKER
Nope. I wasn't. Or it would have booted when I disconnected the first disk.
timinoldsmar
ASKER
I can't verify this, and I'm not an expert on this subject, otherwise I wouldn't be here asking for help, but I think this is the solution that would have worked if I had gone through with replacing the drive, which, as stated above, I never got to do.