Dell PowerEdge T300 logical drives keep disappearing about an hour or two after boot!

Max Nomad
Max Nomad used Ask the Experts™
on
Short description: Dell PowerEdge T300 logical drives keep disappearing about an hour or two after boot!

I've got a Dell Poweredge T300 server with 20GB RAM, 1.50TB storage, and running Small Business Server SBS 2011. The main function of this server is handling Exchange 2010. Roughly 1 to 2 hours after boot the server slowly begins to crash.

At first it looked like the SQL server database was full and something was possibly swamping the processes causing the server to freeze. While watching the disk accesses in the Resource Monitor I saw that the logical drives (C: and E:) would suddenly disappear . Once that happened all functions on the server would slowly crash and I would have to do a hard reboot. The reboot would take about 20 minutes before I could get to the actual desktop to view the event logs, task manager and resource monitor. Everything would run like normal then after about an hour or two the crash process would happen all over again (e.g. - logical drives would disappear, explorer freeze, need to do hard reboot, etc.).

This led me to check the RAID controller (Dell SAS 6IR). At times it froze while I was trying to go through the menu options. Having dealt with this before with another Dell server I suspected that the RAID battery might be failing. Looking at Array SAS1068E it shows a status of Degraded and and 0% Syncd. As I type this I'm about to return to the office to confirm which physical drive has failed.

On Friday afternoon I got in touch with Dell but they made it clear that I wouldn't be able to get real support until sometime on Monday 07/15/19. My tentative plan was to buy a battery for the RAID controller along with two replacement drives but it may two more than a couple of days to arrive.

Here are my questions:

(1) Is this behavior consistent with a failing RAID controller battery or failed drive in an array?

(2) is it possible to stabilize the server with one drive while waiting for the replacement hardware to get here?

I'm in a serious crunch since the company needs email plus payroll also happens on that server. Thanks in advance for your time and input!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2014

Commented:
It;s unlikely to be the battery as the card will run, albeit slowly, with a dead battery,  Unlikely to be a failing disk either, more likely the 6iR itself.
Max NomadIT Consultant

Author

Commented:
Thanks Andy.. Would the 6IR fail intermittently instead of just failing outright? And since it says "Integrated SAS" is that part of the motherboard?  Or could I purchase another 6IR card for that tower to keep it up until they're ready to get another server in place?
Top Expert 2014

Commented:
It's a PCIe card but may have its own special slot, Not sure which part number without machine serial number to look up on Dell's support site.

If it were a disk then you would see it failing under OMSA controller log. Likewise a bad battery appears in OMSA. A PCIe error would lead to the card crashing but PCIe bus faults normally appear on the LCD screen. First thing though would be to simply reseat it as you are unlikely to be able to buy one today. You can also prove it out by seeing if it still hangs without disks and battery connected.

They're dirt cheap as the server's fairly old, $10 on eBay.

https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=sas+6+ir&_sacat=0 shows several different shapes, the one with the blue clip is a pain to get out of the slot but I think the T300 uses a regular slot.
11/26 Forrester Webinar: Savings for Enterprise

How can your organization benefit from savings just by replacing your legacy backup solutions with Acronis' #CyberProtection? Join Forrester's Joe Branca and Ryan Davis from Acronis live as they explain how you can too.

Max NomadIT Consultant

Author

Commented:
Thanks again for elaborating on the card type. The serial number is: 7N1NSL1

Here's the thing: The 6IR doesn't hang during boot. It loads Windows server, let's me get past the login screen and everything appears to be fine. Processes are running as normal, files are accessible, etc. Then within an hour or two the logical drives disappear resulting in the crash. Not sure I understand how unplugging the cables and disks could be used to prove it.

Once I get to the office I'll be opening the server up to take a look inside to see what card is there.
Top Expert 2014

Commented:
JW063 is the part number. It may not be the controller but if a drive went off line it would keep running and if that drive came back after a reboot it would show as foreign. It could be a firmware/BIOS problem so it wouldn't hurt to upgrade but it must have been stable for a long time now.

https://www.dell.com/support/home/my/en/mybsd1/product-support/servicetag/7n1nsl1/configuration has the original bill of materials when Dell built it.
Max NomadIT Consultant

Author

Commented:
I've order the part, unfortunately it won't be here until Tuesday at the earliest. I'm going to leave this thread open just in case there are any new developments or possible solutions. Thanks again for your insights Andy.
Max NomadIT Consultant

Author

Commented:
Andy, here's another question -- after I replace the 6IR card it should recognize the RAID that's in place, correct?
Top Expert 2014
Commented:
Dell PERCs (LSI cards with tweaked firmware) store the config on the drives and the controller, that has to match. Easiest think is to clear the card before using it by putting it in the server without the disks connected, going into RAID BIOS (ctrl R) and under controller settings there is an option to reset to default.

One thing I forgot, if the array is RAID 0 then it could be a failing disk (but it is very unlikely it is RAID 0).
Max NomadIT Consultant

Author

Commented:
So to make sure I've got this straight I'll need to remove the old card then put in the new card by itself (no HDD connected). After that I'll boot up the server, hit CTRL-R to get to RAID BIOS, go to controller settings, reset to default, then shut down. Lastly I'll plug in the drives cable in and boot up server.... is that correct?

"One thing I forgot, if the array is RAID 0 then it could be a failing disk (but it is very unlikely it is RAID 0)"

It is RAID 1.
Max NomadIT Consultant

Author

Commented:
Below are three screenshots from within the SAS setup menu (CTRL-C), provided just in case there's something we've overlooked.
0712191008_1.jpg
0712191010.jpg
0712191017.jpg
Max NomadIT Consultant

Author

Commented:
Thank you Andy, your insight and advice was appreciated.
Max NomadIT Consultant

Author

Commented:
I bought another SAS card, it showed me that the drives were in worse condition that was previously shown by the drive indicators and the card itself. The server crashed pretty hard and we're still recovering from it.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial