Link to home
Start Free TrialLog in
Avatar of pajkico
pajkicoFlag for Canada

asked on

can't boot ML370 G5 server after one RAID 5 drives failed, and second predicted fail.

HP Proliant ML370 G5 server, RAID 5 degrade volume, one drive dead, second predicted fail. Need to recover the server that currently can't boot. Array Configuration Utility shows "background parity code 786 Background parity initialization queued or in progress on Logical drive 1.I tried and the server currently can't boot into it's SBS 2003 OS.

Avatar of Seth Simmons
Seth Simmons
Flag of United States of America image

so one is dead, another is dying - not good for a RAID 5

you are looking at getting replacement disks for both, rebuilding the array and restoring from a good backup (you do have that, right?)

If the 2nd. Disk is predicted to fail, it could already be too late, as it may be more than a prediction already. Besides that, raid 5 isn't very reliable, & when rebuilding it is very likely that other disks will fail in the process.


So, first you should move to another raid type that is more resilient.


You should also immediately get another os, sbs 2003 is antique & out of support for years. Using such an out of date OS is totally irresponsible and careless, you open up your Gates to all possible malware, spyware, viruses etc,as there are no security patches available for that OS.

You'll certainly want to look at an OS update - as Rindi said, 2003 is downright dangerous.

Maybe take a look at RAID10 for your next array setup.

You're probably going to need to recover from backups. If you don't have those, you may end up having to try and salvage data from the existing RAID 5 using 3rd party tools.

Avatar of Member_2_231077
Member_2_231077

It should never have been shut down with a failed drive, they are hot-plug for a reason. So saying it still ought to boot  What happens when you try to boot, any POST messages from the Smart Array controller apart from the 786 error?

Presumably you are using SmartStart CD to get into the ACU, there is an ADU on there too (or a diagnostics tab on the ACU) where you can generate an ADU report to USB stick. IF you upload it we can look through it and shed more light on the problem.
Avatar of pajkico

ASKER

This is a new customer who approached us to try to recover the OS, and boot, They only have a dump of their accounting database, and no other backups.


Sure they need a new server, but they would like to try to recover this one before getting a new one.


When I try to boot, it's telling me there's no system disk, ADU diagnostics is saying that the volume 1 is up (RAID 5), and that there's a boot record present.


I was trying to repair the OS from the Windows SBS 2003 cd 1, but it asks for a RAID controller drivers (F6), and can't read them off the USB floppy drive I used. When I tried to install a regular floppy drive on this server, I realized that the floppy drive cable has second pin blocked on one side of the cable, and third pin on the other side, so I couldn't connect the data cable to the floppy drive. I guess it is an HP proprietary cable? 

https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-c00710498-11 is the maintenance and service guide, (download icon on top right of embedded reader). It lists the diskette cable but I must admit Ive never used an internal floppy drive with one and just stuck to USB ones. You could also slipstream the driers onto a new Windows CD but I haven't done that for 10 years and forget how to use nLite.
Avatar of pajkico

ASKER

Sure, I was trying to install the floppy drive since my USB floppy drive didn't do the job, some comments are saying that is because it is not an HP USB floppy drive.

Try the rear USB ports rather than the front?

"I was trying to repair the OS from the Windows SBS 2003 "


Keep in mind that the longer you run the disk with the predicted failure, the more likely it will fail further, perhaps completely.  I'd look at booting from a USB stick (preferred) or a CD that has the RAID controller drivers available.  I think the drivers have to be on a floppy for SBS 2003, but you could use something newer that doesn't have the requirement (and may even include the drivers).  Once booted, check the array to see how readable it is.  If it is, I'd back up critical data, then try to do a full backup to an external drive using whatever full backup program you prefer.  If successful (which is questionable but possible), then you could look at restoring the backup to a new set of drives.

The hope so long as you did not do a thing is to look at the hardware log to see which drive was kicked out last and force it back online to reconstitute the RAID 5 volume.  8n asituation such as this it is very dangerous as forcing the wrong one on line will wipe the data bases on the drive set.

Failed, predicted failed is a situation that the predictive failure drive was kicked out long ago with the recent failed drive is the last straw.

As others mentioned backups are ......
Avatar of pajkico

ASKER

I have already replaced the failed SAS 73 GB drive, which might have failed a long time ago. I also added the hot spare drive for the volume 1. When I boot to Acronis it can't see a volume to backup, but sees RAID 5 volume to add as additional disk.

A predictive failure would not bring down the array. As was noted. The raid 5 experienced two disk failures.
In the raid controller, raid 5 volume how many disks and how many are ok?

You have to look through the log to see which disk was kicked out last, if the drive you replaced was last, is it really dead?

OnHPE megaraid/smart array controller, it is aa tricky thing /situation.
"I have already replaced the failed SAS 73 GB drive": is that shown as a spare drive or did it become part of the array?  If the latter, did the array finish rebuilding?

The answer to arnold's question "how many disks and how many are ok?" is important here.

"When I boot to Acronis it can't see a volume to backup, but sees RAID 5 volume to add as additional disk."  What size volume does it see?  How does it see the disk as being configured?
Avatar of pajkico

ASKER

As I said earlier, one drive failed a long time ago. The second drive is a "predictive failure". The RAID controller disabled logical drive to prevent the damage. The third drive seemed ok. Once I replaced the drive that failed a long time ago, I went ahead and enabled the logical drive, which is since then showing up. I tried to boot the server after that and it didn't see a system (boot) disk. 

After that I tried to repair the OS using the OS Windows SBS 2003 setup CD, and I stopped when I couldn't load the RAID controller driver (P400).

Avatar of pajkico

ASKER

The original RAID 5 was with 3 SAS 73GB disks, 10k. I didn't see the rebuilding part at all. I might to have to check the logs to see if it did rebuild, but I still have the message on the RAID controller showing the information status message:

Code 786, Background parity initialization is currently queued or in progress on Logical Drive 1 (136.7 GB. RAID 5). If background parity initialization is queued, it will start when I/O is performed on the drive. 

"which is since then showing up": it sounds as if this started as a 3-drive array.  Is that correct?  What does the controller show as the status of each of the three drives?  It sounds as if one says "predictive failure", but it's not clear about the other two.

It would be very informative if you could boot from a USB stick with a version of Windows that already has the drivers for the RAID controller.  Server 2019 likely does and Windows 10 may also.  Once booted, you can use diskpart to confirm what is seen in the way of disks (lis dis) and wich volumes are seen (list vol).  Note the drive letter for the largest volume.  Exit diskpart and do a dir /s on that drive letter.  That will tell you a lot about the state of the array.
"Background parity initialization is currently queued": that's telling you that it wants to rebuild the array.

"it will start when I/O is performed on the drive": that implies to me that you need to access the volume to initiate the process.  I don't know if that is absolutely correct.

Somewhere in the controller you can see the status of the rebuild.  If it is stuck at 0% for a while, that would indicate to me that it won't start until you access the drive.  If it is between 0% and 100%, it is rebuilding.

Keep in mind that the rebuild will be based on the data of the old drives, including the one with predictive failure.  If it has any unreadable sectors, you could have more of a problem.

To load theraid controller you need to hit F6 and have the drivers on a Floppy. Old style mess.

When the volume is up, the system would have booted.

Perhaps the dead drive was the boot standalone?
Avatar of pajkico

ASKER

There were no standalone drives. This was originally a RAID 5 with three SAS 72 GB drives. I did run a HDD regenerator on the volume 1 and it found no errors. F8 smart array shows all three drives as Ok. When I boot with the smart start cd, it shows second drive as "predicted failure", but shows the RAID 5 volume there. I would like to start the rebuild, but I didn't see how.

"F8 smart array shows all three drives as Ok": as individual drives or as part of the array?  A screenshot (even from a cell phone) would help.

It's not clear if the controller thinks that all three drives are part of the array and that all three drives are active parts of the array (as opposed to one being available for rebuild).

Avatar of pajkico

ASKER

User generated image


Avatar of pajkico

ASKER

User generated image


Avatar of pajkico

ASKER

I also added another hard drive as a hot spare for the RAID 5 volume, not showing on the top screen since I added it after I took that photo.

Avatar of pajkico

ASKER

User generated image


The "logical drive 1, status OK..." implies to me that the array is complete with 3 drives.

The next step is to try to access it from a different boot device (CD or USB stick).  That will tell you a lot about the state of the files.
Avatar of pajkico

ASKER

User generated image


Avatar of pajkico

ASKER

User generated image


Avatar of pajkico

ASKER

Logical drive more info still points to Parity Initialization Status as "Queued"?

I'd be looking at doing a backup of the volume long before trying to get it to rebuild.  For one, it will give you a good clue of the status of the files.  If you can't access them with a drive that is "predicting failure", you aren't likely to be able to access them after a rebuild.  In fact, the chances may get worse.
Looking at your display. your Raid 5 volume is a three disk volume, ,137GB 2x73+1 73 parity

The failed drive was something else. And could explains why the no boot device is the error.

Avatar of pajkico

ASKER

Well, I have replaced the failed drive, and I didn't replace the one that's "predicted fail". The reason why rebuild didn't start yet could be the status of that "predicted fail" drive. I tried to do the backup by booting into Acronis image backup, but it didn't see a volume to backup.

You need to preload the controller driver to see the volume.

Check the controller boy to see what it points to for the boot volume

Have you tried using linux ubuntu, linux mint to boot the server in hopes of loading the array?
Avatar of pajkico

ASKER

Is there a way to preload controller driver when booting with Acronis?

I haven't used Acronis for years, but I assume it does have such an option, how would you restore an image otherwise. But if the RAID Array is down, making an image backup would probably not be possible, as the software would require access to the array & the disk partitions on it.


What might work is to rather use a disk cloning software, where you clone 1 disk to the other 1:1, & then repeat that with the other disk. But to do that, you would have to connect the disks to a non-RAID SAS controller to clone the disks. You'd also have to make really sure that you don't select the wrong disk to clone by mistake.


After that you might just be able to import the newly cloned disks into the RAID controller.


As far as I know this is a real hardware RAID controller, so you would boot from the array, & not a single disk.

Have you tried the rear USB ports yet to load the drivers for Windows repair? The front ports don't play well with floppies or USB sticks, they are on a separate chip to the rear and internal ones.
Avatar of pajkico

ASKER

I have two rear USB ports, one is used for keyboard, the other one seems to not work. I could try the one where keyboard is by plugging the keyboard to the front USB. I still believe that I need an HP USB floppy drive for that to work.

Avatar of pajkico

ASKER

I could take the drive that's predicted fail to another server and try to do Acronis disk image of it and restore it to another (better) disk? Then I would put it back to this HP ML370 G5 server. Would that work?

No. That wouldn't work as I explained above. For Disk imaging to work (that is actually the wrong description, it should rather be "Partition Imaging"), you need access to the partitions, which you wouldn't have on a disk that was part of a RAID 5 Array.

Pretty sure there was no such thing as a HP USB floppy drive at the time these were made, I built and installed quite a lot of them as I was working for an HP VAR at the time they were current.

Imaging one drive from a RAID set could be disastrous, cloned metadata onto a disk with a different serial number may baffle the controller.

You could install a couple of new disks in spare bays, set up RAID 10 and install Windows onto them using SmartStart and get to the data that way, but depending on firmware and system ROM version it can be difficult to change the boot order. Not all firmware versions have this option:User generated image
Avatar of pajkico

ASKER

So, I could let's say add another two SAS drives, and Install a different OS, and then try to access the original RAID 5 volume? The server only has 4GB RAM, so I'll have to either increase / add more RAM or use the OS that could be installed with 4 GB of RAM.

You can use a different volume onto which to install an OS .
once that OS boots, it will provide you access to the RAID 5 when it is online, which By the image it says that it is.

You had four Disks in the System with a Three Disk RAID 5 of 137.GB

Not sure which other OS the ML370 G5 supports in terms of hardware ..... and Drivers.
The issue/risk since you add a hot spare is to get the rebuild going from the Predictive Failure to the hot spare. The other concern is the one that died last was not part of the RAID 5 volume, or you would have had a RAID 5 volume of 200+ GB with one failed drive.

2008/2012 or linux based to boot the system and load the Smart Array drivers, rescan and you should see the array, but since you are presenting the replaced drive as a hot spare......

Within the raid constroller/boot options, there should be a boot search order where it might help identify whether the single dead drive was the sole OS drive.

Or the issue is the dead drive was the prior hot spare. 3 drive RAID 5 with a hot spare.
when the predictive failure was detected, the system transitioned to the hot spare that became active.
when this drive failed, it should reflect the RAID in a degraded state, but it is not.
Whether the controller on boot, reassembled the RAID based on the predictive failure drive plus the other two will mean it wiped/lost information. The predictive failure drive data outdated, but ...
Have seen it, a drive was kicked out of the array.
on reboot the controller brought it back into the ARRAy, and data on the array was consolidated based on the new rejoined drive (data sync) no innervation was required. ...

Old drives, the more you take time, the more likely bad things could happen.

Yes, so long as you have the option in the screenshot above from ORCA during POST. If they have the original SmartStart CD that would be ideal to install Win2003 from since it installs the drivers for you. Just make sure it doesn't install on the wrong disks but you can always remove them after creating the new array and logical disk.
Andy, I would be careful with that, had a similar thought with a poor outcome.
It wiped the other array even when I created a volume via the controller and during the OS deployment, told it to use the existing array (created earlier).

Pulling the RAID 5 array drives out to install the OS volume, and then import the RAID 5 config by reading the metadata from the DISk would be an option, if the VOLUME was Solid, the current Volume's State is unknown.

another though, if there is another similar server.

note which two drives are not having issues 1,2

poweroff this server. pull the two drives. and see whether the ARRAY can be imported into the other system in degraded form
this way you can see what if any data is present to either rebuilld the array in a system that can boot... You have to import the config within the controller bios on boot from the DISKS or they will be seen as Foreign Disks....
RISK total data loss.



if you have a SAS HBA to which you can attach multiple drives
using https://www.runtime.org/raid.htm might be an approach to see whether it can access the raid. imaging the disks individually (note to make sure their positions so that you can return them into the HP server (while power is off on the server)

Raw media copy
RAID has two sets of header data, the RAID config metadata and then the data ......
There is no import function on Smart Array controllers, that's why I said create a new array with the disks in, then pop the old ones out during install to avoid overwriting. This ensures the metadata on all the disks matches. After installing, powering off and pushing them back there will be a POST message about Array A being available again and since the RIS on all disks matches it will automatically re-enable the 3 disks.
While I agree with the concept of setting up a new array (with the existing array disconnected) and then looking for data on the old array, couldn't that be accomplished much more easily with a Windows 2019 USB installation stick?  I'm presuming that 2019 will have the drivers for the RAID controller.
I don't think the drivers are inbuilt even with 2019, it's not the case that software vendors are too lazy to provide them but that HP/HPE won't let them. That's why all the data recovery tools don't support Smart Array controllers out of the box, HPE are copyright control freaks.

SmartStart install is going to be simplest as it puts the drivers on for you but that requires an OS the same age as the machine so 2003 or maybe 2008.
@Andy:
Thank you for the clarification on the drivers.

One could go down the USB path with F6 drivers on it using an OS that supports loading drivers from other than a floppy drive.  This presumes that HP has drivers for such a later version of Windows Server.
2003 would not recognize the USB as a viable source, it will look for a Floppy or possibly a CD.
the other issue, not sure whether the ML370 can boot from a USB.
Avatar of pajkico

ASKER

the USB floppy drive is used to provide RAID controller (P400) drivers for the OS. This is a standard procedure.


What I have problem with now, is the RAID controller (P400), I used to boot to smartstart, but now it is hanging there and progress bar never goes anywhere. I might have to replace the RAID controller, which might have been giving me issues from the beginning. 

Avatar of pajkico

ASKER

this server only has 4 GB of RAM, so windows 2008 couldn't be installed, so I'll have to go with 2003.

P400 can hang with garbage in cache RAM, since it was shut down properly you can clear the cache by taking the cache module off and disconnecting the (probably dead) battery. Probably won't work but only takes 5 mins.

I can't recall that Windoze 2008 couldn't get installed on a system with 4GB or less RAM. If I remember correctly, there was even a 32bit version of 2008 (2008r2 was 64bit only I believe). 32bit OS's should easily install & run on a system with 4GB RAM, but I think also the 64bit versions should be installable.

If you can install that OS, you must be having another issue, not the amount of RAM.


But some servers have a setting in the OS that allows you to limit the max RAM for the installation. I think that was mainly used to get OS2 installed.

Once you added a new as a hot spare, has it began transitioning the volume from the predictive failure drive?

This says that the minimum RAM requirement for Server 2008 was 2G: https://learn.microsoft.com/en-us/iis/install/installing-iis-7/install-windows-server-2008-and-windows-server-2008-r2

How did you determine: "windows 2008 couldn't be installed"?
Avatar of pajkico

ASKER

I haven't seen that it has transitioned. It is still saying the same thing (code 786).

Avatar of pajkico

ASKER

I could install Windows 2008 32-bit, or Windows 2003 32-bit. I do have Windows 2003 Enterprise 32-bit disk with me, so I could install that one on a pair of 73 GB SAS drive that i have installed and created a mirrored volume. 

If memory serves, on HP of that generation you had to actively indicate the failed drive you replace IS being replaced. Unlike other where merely removing the failed drive and inserting its replacement.

You should install an os o. a new volume, within the controller it should indicate the volume as a boot device.

I would boot using the win2k3 inthis case and use F6 to present the uncompressed drivers.
Floppy, CD media (with CD media, you may have to swap them in and out until the drivers are loaded and then rescant should detect the volunes.
Avatar of pajkico

ASKER

I tried installing win 2003 using smart start cd, but after the reboot it didn't boot to the OS. I booted off the win 10 USB, and went into diskpart. List disk shows RAID disk, and a new mirror that I created for the new Win 2003 OS. When I list partitions under win 2003 (mirror), it only shows one 4096 MB primary partition. RAID 5 (136 GB) disk doesn't show any partitions. What's your take on this? In BIOS I have "C" as a boot option.

Avatar of pajkico

ASKER


User generated image
Avatar of pajkico

ASKER

Now it says that the Parity Init Status is "In Progress". Before I had as "queued".

You have to look at the smart array to make sure which logical volume it has listed as the boot device.
Unlike the regular boot search, the RAID controller has a single volume it has sversus scanning through existing volumes till it finds a boorable/active one.
Avatar of pajkico

ASKER

Ok, finally the Parity Initialization has completed. What does that really mean?

Avatar of pajkico

ASKER


User generated image
Avatar of pajkico

ASKER

I currently have all three of the RAID 5 volume drives blinking together every 2 seconds

Avatar of pajkico

ASKER

also, how do I run the hpacucli on this server to get more info about the RAID 5 volume?

Avatar of pajkico

ASKER

As you can see on the picture, the last three drives are blinking green light together?User generated image


Have you set Logical drive 1 on Array B to be the boot disk? I put a screenshot of that setting in my post on Friday.

Also if you click the diagnostics tab in the ACU you can get an ADU report on USB stick which has all the info you could ever want.
Avatar of pajkico

ASKER

I don't have an option to select the boot volume. Do you know where can I find the firmware update for the P400?

Windows version is at https://support.hpe.com/connect/s/softwaredetails?language=en_US&softwareId=MTX_1a18b793015743c8b49b16944c but may not be usable to you as you have no Windows server OS.

Linux/Supplemental is at https://support.hpe.com/connect/s/softwaredetails?language=en_US&softwareId=MTX_7e4ae58f0d9c42f6a3d47b587c - this can be used as a supplement to the firmware DVD

DVD (amazingly still available) https://support.hpe.com/connect/s/softwaredetails?language=en_US&softwareId=MTX_4372f442c3424063809e3d9198 but requires HPE account.
Alt link probably good http://www.mediafire.com/file/9m4rhs9mny15da4/firmware-10.10-0.zip

There is an alternative method to swap Array A with Array B but don't use until other options used up. Pull the 3 disks, boot from the mirrored pair and they should become Array A so be the boot volume. Shut down and add the RAID 5 and the controller will make that Array B since A is already in use.
Avatar of pajkico

ASKER

ADUReport.txt


Here's the ACU report. I have also upgraded firmware as per your link.

The option to set the boot volume may be stored in CMOS rather than on the card so the system ROM (motherboard BIOS) may need updating.
Can't see much wrong in the ADU report except for the one with predictive failure lit although 2I:1:1 also shows 6 media failures since factory. All show 0 for read errors hard which is a good sign.

Performance must be pretty poor, it's running RAID 5 without cache battery but it's not as if that's something that happened recently, probably sold that way.
Avatar of pajkico

ASKER

I do have an option to change the boot volume now after firmware update.

So if you set it to Array B you can boot the Windows you installed via SmartStart and the data should be available since the volume is sound at least according to the ADU report and the ACU screenshots. May be corrupt but chkdsk will verify that.
Avatar of pajkico

ASKER

I did boot to Windows 2008, but the RAID 5 volume is showing in disk manager and asking me to initialize it, which I don't want to do so I don't erase it.

Take a peak at it with https://www.runtime.org/captain-nemo.htm , it's read-only so won't cause any damage.
If it asks to initialize, the data likely got wiped.
The other possibility is to use test-disk
Potentially the partition table got corrupted......

Yes, maybe corrupt partition table, that's why I suggested Captain Nemo, it's not a full-blown data recovery tool like Getdataback but it will cope with bad partition tables.
ASKER CERTIFIED SOLUTION
Avatar of pajkico
pajkico
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If it were in my shop, I'd still give a try with the Raid Reconstructor software from runtime.org.  It's the same company that produced the Captain Nemo that andyalder mentioned.  It isn't clear to me if they have a trial version that will give you a better idea of whether or not it can recover anything.  It runs $99 if you have to buy it.  I've been successful at several RAID 5 recoveries with it.  One key it to make sure that you try to recover from the correct set of drives.

That being said, a competent data recovery company should be well versed in it or similar tools.  If they couldn't recover anything, it's unlikely that a regular tech would.  I'd give it a try, though, just for my peace of mind.
RAID may be nothing to do with it, could have been running with a disk down for weeks and then a software fault or virus destroyed the data. There are no bad blocks on the disks so it should be working fine as far as hardware goes.
Avatar of pajkico

ASKER

I know, but it's out of my hands now.