slselwyn
asked on
Problem with Raid rebuild
Proliant ML350 G5 with Smart Array 200i embedded.
Firstly the cache battery failed and was replaced with no problem.
The Raid is set up as 2 logical drives with 2 drives in each - mirrored. Logical drive 1 is composed of 2 SAS drives with Windows Server 2012 R2 on it. Logical drive 2 has 2 SATA drives mainly with data.
One of the drives failed in Logical drive 2 with a warning in Smart Storage Administrator and an amber light on the drive. 2 SATA drives identical to the remaining one were purchased and one was inserted with the server off. F1 was pressed on re-starting the server in order to rebuild the array and all seemed to go well, however when it finished Smart Storage Administrator reports 'Logical drive 2 is queued for rebuilding' and on rebooting I am asked to press F1 again to rebuild which it tries to do again.
The second purchased SATA drive was thoroughly tested using WD software and a full delete was carried out. This was then inserted to see if the fault was with the first replacement drive but exactly the same has happened. The rebuild seemed to finish at around 87% complete.
I have looked at what is on the first replacement drive and it looks as though it did mirror successfully. There are no errors in the Smart Storage Administrator reports other than the drive queued for rebuild and the lights on the drive seem to be acting normally.
Any advice would be appreciated.
Firstly the cache battery failed and was replaced with no problem.
The Raid is set up as 2 logical drives with 2 drives in each - mirrored. Logical drive 1 is composed of 2 SAS drives with Windows Server 2012 R2 on it. Logical drive 2 has 2 SATA drives mainly with data.
One of the drives failed in Logical drive 2 with a warning in Smart Storage Administrator and an amber light on the drive. 2 SATA drives identical to the remaining one were purchased and one was inserted with the server off. F1 was pressed on re-starting the server in order to rebuild the array and all seemed to go well, however when it finished Smart Storage Administrator reports 'Logical drive 2 is queued for rebuilding' and on rebooting I am asked to press F1 again to rebuild which it tries to do again.
The second purchased SATA drive was thoroughly tested using WD software and a full delete was carried out. This was then inserted to see if the fault was with the first replacement drive but exactly the same has happened. The rebuild seemed to finish at around 87% complete.
I have looked at what is on the first replacement drive and it looks as though it did mirror successfully. There are no errors in the Smart Storage Administrator reports other than the drive queued for rebuild and the lights on the drive seem to be acting normally.
Any advice would be appreciated.
ASKER
Is there any reason why a hot swap is any better than a shut down and using F1 to start the rebuild during the boot process?
The controller knows what drive has been replaced if you change the disk hot. If you change it cold the controller doesn't know which disk has valid data on it if both come alive after the power cycle. It makes a guess from the metadata and absolves itself from any blame because it was you that pressed the F1/F2 prompt.
Try the experiment yourself, take two disks from two different servers and put them in a new server and see which one gets overwritten on boot. We know which gets overwritten if you hot-swap it.
Assume there is a 1% chance of someone changing the wrong disk, pull the wrong one out hot and the system crashes, probably takes 15 minutes to boot again with the right disk. Make the same mistake with cold-swap and it's restore time although you could still have a working system that's a few weeks out of date.
Try the experiment yourself, take two disks from two different servers and put them in a new server and see which one gets overwritten on boot. We know which gets overwritten if you hot-swap it.
Assume there is a 1% chance of someone changing the wrong disk, pull the wrong one out hot and the system crashes, probably takes 15 minutes to boot again with the right disk. Make the same mistake with cold-swap and it's restore time although you could still have a working system that's a few weeks out of date.
ASKER
That's really useful. I have attached the ADU report. Does this show anything that could be causing the problem? I have looked but do not know what might be normal.
ADUReport.txt
ADUReport.txt
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
BTW, since it's backup/restore time I would buy a couple more disks, one to replace drive 4 and one more so you can have the same space using RAID 6 (RAID ADG) which can tolerate bad blocks on one drive plus another drive failed.
And before any of this **make sure you have a backup**
And before any of this **make sure you have a backup**
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Yes, that's what you have to do although you delete array B as well as the logical drive that is on it.
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.
I have recommended this question be closed as follows:
Split:
-- andyalder (https:#a42225289)
-- slselwyn (https:#a42226750)
If you feel this question should be closed differently, post an objection and the moderators will review all objections and close it as they feel fit. If no one objects, this question will be closed automatically the way described above.
Pber
Experts-Exchange Cleanup Volunteer
I have recommended this question be closed as follows:
Split:
-- andyalder (https:#a42225289)
-- slselwyn (https:#a42226750)
If you feel this question should be closed differently, post an objection and the moderators will review all objections and close it as they feel fit. If no one objects, this question will be closed automatically the way described above.
Pber
Experts-Exchange Cleanup Volunteer
Note you should not power off to replace the disk, that causes all kinds of problems.