Link to home
Start Free TrialLog in
Avatar of ravenrx7
ravenrx7

asked on

Dell 2850 not booting to OS after restart

Hello,

We have a old 2850 running a hardware raid 5, after a simpe reboot today we're getting an error, which Ive attached. Looking like one of the drives is out?? Any ideas?

what should I do?
IMAG0859.jpg
IMAG0860uu.jpg
Avatar of Brian Harrington
Brian Harrington
Flag of United States of America image

do you have a current backup?
Avatar of joinaunion
Try pressing F2 at start up,then goto Integrated Devices Screen Options .
Is raid enabled?
http://support.dell.com/support/edocs/systems/pe2850/en/ug/t1390c30.htm#wp1043338
Avatar of ravenrx7
ravenrx7

ASKER

yes i have a backup, acronis but im not going that route, i thnk its a probelm with the raid and one of our hards.. join-let me check one sec
You may just need to rebuild the array, unfortunately, it may screw up....
yes raid is enable - join
Are any of the disks showing a blinking orange light?
what if you press any key (to continue)? if only one disk is broken other 2 (I just suppose you have three what is minimum for run RAID 5) should be able to start and handle this situation. Also you should check bad disk and complain it or buy new one and then rebuild RAID again
on the hard drives front panel- no sir
hit i ht any key it trying to boot from the NIC, which i have a PXE setup, went into BIOS order is HD first
ASKER CERTIFIED SOLUTION
Avatar of David
David
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
i have 5 drives
I do have a full backup we run Acronis Server backup, so do I procedd with a RAID reapir?
Assuming you did just a reboot and did not remove and replace any drive; did you try the standard fix?

Shutdown the server
pull the power cords (all of them)
eject drive 0
replace drive 0
eject drive 1
replace drive 1
continue with all the drives
reconnect power
boot up
no spending that cash on that is not an option, rather stupid idea. come on. I have a RAID 5 system and you're saying throw in the towel take it some where? come on
These are your options,if you have ROMB KEY installed select Raid otherwise select SCSI

Embedded RAID Controller
      

Selects between RAID Enabled, SCSI Enabled, or Off. The configurable options vary, depending on whether the optional ROMB key and memory are installed.

    With the ROMB key and memory module installed — Select either RAID Enabled or Off.
    Without the ROMB key and memory module installed — Select either SCSI Enabled or Off.

Channel A and Channel B operate independently. If the Channel A displays RAID Enabled, Channel B can be set to RAID Enabled, SCSI Enabled, or Off.
let me try that.. i have no reconnected the drives, I havent made any changes in proabbly 3 months
another option if you get no love from reseating the drives is to remove one drive and reboot. If you get a different error; like no RAID set, reinsert the drive and move on to the next. When you get a degraded state without an error it should boot. Most likely that drive is bad and/or inconsistent. Once booted in the degraded state, insert the drive, it should rebuild it automatically. I have done this before with a drive that was at the beginning state of failure. Once I determined the drive that was failing, I replaced it.
the resets didnt works.

join--so i have channel a raid channel b scsi
k Hertiage let me try that
ravenrx - you misunderstand.   I am saying that IF you do not have a backup, then those drives are years out of their designed lifespan of 3 years.  They may not survive a rebuild.  

So if you kick off a rebuild, and have a hard crash, nothing in the world will be able to get it all back, and you will pay $4000+ for just part of the data.

Or you can spend $4000 and probably get 99.999% of your data.

Or pay $500 - $1000 and take image backups of the raw disks first, just in case.

If you do have a currrent backup, no worries. Tell it to continue and let it boot up in degraded mode. If it does NOT boot in degraded mode, you have metadata failure or mis-match.  A dead battery can cause a mismatch.  Either way, default will be to use the data on the disks, not the NVRAM that is likely unprotected due to dead battery.

Bottom line, tell it to go ahead and continue to boot. No need to do anything in the RAID menu. See if it boots .. (But realize you have maybe 25% risk of losing all you have on the hard drives due to the age of the equipment alone).

If you have never taken and TESTED a full backup and recovery, then I would hire somebody to go onsite.  Too many variables here and you certainly have a multiple failure scenario already.
trying the drives swapping and rebooting, now
dlthe- yeah it wont boot i do get the mismatched error- i think it shows in one of the images i uplaoded.. battery like the CMOS?
battery like the one on the RAID controller.
Absolutely it is the battery.  I have some old CERCs.  Battery good for 5-7 years typically.   But you have to get the system online and fully operational with all disks before you change out battery.    I think it is that standard 2012 you can buy at any drug store.  Can't remember.
can i replace that battery?
yeah this server has been up and running since jan 2005 LOL!
dang went through the all the drives, swapped rebooted and none went ahead and booted to the OS,
You may be able to replace the baterry. Call dell support.  They will take the call, and sell you the battery.  But you still may have to recover from backup.
THat is why you never saw the problem until now.  But really, take it from me, those disks are high risk of dying.  A parity rebuild is most stress those drives have had since the system was installed and you really do have high risk of a 2nd HDD failure during a rebuild.

I have first-hand experience with drives dying on this very system and controller.  Rather annoying because while the data was protected, had to rebuild the O/S.  <sigh>

It just isn't worth throwing more money at those drives or even system.  You're probably getting 30 whole MB/sec write speed too on that RAID5 with that controller.
so none of those raid options i need to do?
yeah sad thing is we have a new coming in on the 24th,
The battery is a secondary issue and is not preventing the system from booting.   You have at least 3 failures all working together to give you grief
 - metadata mismatch
 - No NVRAM, so no event log
 - probable disk failure
 - probable unreadable blocks on surviving disks
 - near certain XOR parity mismatches
 - Data corruption (highly probable, as the system was already w/o NVRAM due to dead battery, so you did lose at least a little bit of data
 - I doubt you have ever done a consistency check or scan/repair bad blocks - Well, no way you have.. So this means you have XOR parity issues and certainly unreadable blocks. it is almost a statistical impossibility not to have them.

You need to at least image each disk to scratch drives with a non-RAID controller, and try to reconstruct with that using some commercial software from runtime.org, but that software will NOT deal well with read errors, or a partial parity rebuild - no way of knowing due to the dead battery.

If the data isn't backed up, you need to pay somebody to at least try to do this for you, unless it all makes perfect sense and you have the hardware to do it.  Maybe $1000 to reconstruct it all to a single non-raid disk drive you can boot, with minimal damage.   (After they run hardware diagnostics to assess the risk).

No way could i talk you through this, as you don't have the software anyway.
It knew it was about to be replaced.  That's how they roll.  Try calling dell tech support.  you still have phone access with them.
support.dell.com, there is a phone number on the screen,
yeah I have an old dell 1800 server im inthe process of restoring my image just so i can get the users up and running on thi, that works when the new server comes in ill just restore to it, in the meanwhile ill have this 2850 with 5 drives being not used.. i cant just recontructed it,  or just say screw it.
You may want to give this a read,
http://support.dell.com/support/edocs/systems/pe2850/multlang/ROMB/F6586A00.pdf

What happens if you set both channels to SCSI?
Rebuild the array and restore from backup. You loose more time and nerves with these attempts to fix the RAID than you would loose trying to restore from backup.
0 physical drives detected, so it could be backplane power missing, data cable disconnected or one disk pulling the whole bus down.

I would power up with just one disk installed and see if it spins up and gets detected. It won't rebuild of course with 1 out of 5 disks present but it'll give you more info, then if it is one naughty disk you can power on with 4/5 of them present and rebuild from a normal degraded state. If not at least you have a backup available.
I dont know if this matters but this mornig came booted up the server ( crashed server) and the first two bays have amber lights blinking..
when i went to rebuild, I show 5 failed drives.. so i need to replace that battery right
Sounds like a double disk failure. Very rare indeed. A restore is probably your only option now. You will need to get two new disks.

If it booted this morning, perhaps the drives were cool and are overheating. Really doesn't matter, they will fail again.
oh it didnt boot, i just noticed the blinking leds which werent on yesterday, so its possible, i replace those two disk and it still wont boot right
Does it still say "0 physical drives found on the host adapter" during POST?
Correct. RAID 5 can tolerate a single disk failure. The extra drive is used for parity. Actally the parity is spread across the drives, but two drive failure means it won't be able to find the data.
1. Replace the battery
2. Replace the drives
3. If does not help - rebuild.
ok understand now, just a question..someone was talking about RAID or SCSI i took a picture of this, is there a ay to move this over to SCSI? would that work
IMAG0863.jpg
Still all drives seen as failed/offline I see, do you have a spare disk to test with?
I dont .. ughh.. I'll order a few disk, replace the battery which is on the RAID card right , rebuild array and then restore backup, man what a pain!
Since none of the disks are seen the data on them may be intact, what you are seeing is the same as if the SCSI cable had come off. I'd still power it on with just one disk in and see if it sees it (obviously dont accept to change any settings with just one disk it's just for test).
If the ROMB battery has indeed failed then no you can't use SCSI. Its dependent on the battery also.
You said there were flashing lights ,what color were they?
Here is a list of what the lights mean for troubleshooting.
http://support.dell.com/support/edocs/systems/pe2850/en/it/t1393c20.htm#wp1039173

Run system diagnostics instructions here.
http://support.dell.com/support/edocs/systems/pe2850/en/it/t1393c40.htm#wp1033246
Hows things going?
You still with us?