Link to home
Start Free TrialLog in
Avatar of slselwyn
slselwynFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Problem with Raid rebuild

Proliant ML350 G5 with Smart Array 200i embedded.
Firstly the cache battery failed and was replaced with no problem.
The Raid is set up as 2 logical drives with 2 drives in each - mirrored. Logical drive 1 is composed of 2 SAS drives with Windows Server 2012 R2 on it. Logical drive 2 has 2 SATA drives mainly with data.
One of the drives failed in Logical drive 2 with a warning in Smart Storage Administrator and an amber light on the drive. 2 SATA drives identical to the remaining one were purchased and one was inserted with the server off. F1 was pressed on re-starting the server in order to rebuild the array and all seemed to go well, however when it finished Smart Storage Administrator reports 'Logical drive 2 is queued for rebuilding' and on rebooting I am asked to press F1 again to rebuild which it tries to do again.
The second purchased SATA drive was thoroughly tested using WD software and a full delete was carried out. This was then inserted to see if the fault was with the first replacement drive but exactly the same has happened. The rebuild seemed to finish at around 87% complete.
I have looked at what is on the first replacement drive and it looks as though it did mirror successfully. There are no errors in the Smart Storage Administrator reports other than the drive queued for rebuild and the lights on the drive seem to be acting normally.
Any advice would be appreciated.
Avatar of Member_2_231077
Member_2_231077

Need you to post an ADUreport (which you can get via SSA) to diagnose, if the rebuild gave up part way through it is probably too many bad blocks on the other drive in the mirror for a rebuild to be any good. Look for "read errors hard" count and compare it with before and after replacing the disk.

Note you should not power off to replace the disk, that causes all kinds of problems.
Avatar of slselwyn

ASKER

Is there any reason why a hot swap is any better than a shut down and using F1 to start the rebuild during the boot process?
The controller knows what drive has been replaced if you change the disk hot. If you change it cold the controller doesn't know which disk has valid data on it if both come alive after the power cycle. It makes a guess from the metadata and absolves itself from any blame because it was you that pressed the F1/F2 prompt.

Try the experiment yourself, take two disks from two different servers and put them in a new server and see which one gets overwritten on boot. We know which gets overwritten if you hot-swap it.

Assume there is a 1% chance of someone changing the wrong disk, pull the wrong one out hot and the system crashes, probably takes 15 minutes to boot again with the right disk. Make the same mistake with cold-swap and it's restore time although you could still have a working system that's a few weeks out of date.
That's really useful. I have attached the ADU report. Does this show anything that could be causing the problem? I have looked but do not know what might be normal.
ADUReport.txt
ASKER CERTIFIED SOLUTION
Avatar of Member_2_231077
Member_2_231077

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
BTW, since it's backup/restore time I would buy a couple more disks, one to replace drive 4 and one more so you can have the same space using RAID 6 (RAID ADG) which can tolerate bad blocks on one drive plus another drive failed.

And before any of this **make sure you have a backup**
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Yes, that's what you have to do although you delete array B as well as the logical drive that is on it.
Avatar of Pber
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I have recommended this question be closed as follows:

Split:
-- andyalder (https:#a42225289)
-- slselwyn (https:#a42226750)


If you feel this question should be closed differently, post an objection and the moderators will review all objections and close it as they feel fit. If no one objects, this question will be closed automatically the way described above.

Pber
Experts-Exchange Cleanup Volunteer