• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1208
  • Last Modified:

Errors with Server Hardware

2 identical servers - Server A and Server B. Server A keeps giving me the blue screen for driver error and the screen I attached below. This happens at least once a week. Is it a RAM issue?

 Server B had an issue with the hot swap drive. Drive taken out, replaced and now we get error "no boot filename received". See attached.

Needless to say...HELP!
ServerA.jpg
0
renniscom
Asked:
renniscom
  • 6
  • 6
1 Solution
 
renniscomAuthor Commented:
0
 
PerarduaadastraCommented:
The first thing I notice is that the servers seem to be rather elderly, if the BIOS dates are anything to go by. The second is that the BIOS version is the first production release (P01) if the board follows the Intel convention of the time. Upgrading to the latest revision might be useful at some point in the proceedings, but be aware that there can be caveats when updating the BIOS over a large number of revisions; always check the release notes that Intel provide with each update in case, for example, version 7 has to be installed before you can update to version 11.

The server A problem does look like faulty RAM, which the system has marked as such, and should be fixed before investigating the driver problem, as dodgy memory can compromise even the best behaved drivers. Is it always the same driver that is implicated, and if so which one is it, and what is the STOP error and heading?

The error on server B is a little simpler - the system can't find a boot device! The boot filename is needed for booting from the network rather than a local disk (or disk array) and by default is the last option in the boot device order. This means that it has tried to boot from the floppy drive, the optical drive, and the hard disk (or disk array) without success; when the last chance network saloon fails to provide boot files the system just gives up.

If the drives are hot-swap, go into the RAID adapter embedded configuration utility (accessible during the POST by pressing a key combination such as Ctrl-A or Ctrl-R   - the exact key combination will be displayed on screen) and find out what's going on with the drive(s) in the array.

I've just re-read the question, and I note that you refer to "the" hot-swap drive, implying singular; if there is only one drive, and you've replaced it with a new one, that would explain the boot failure, as a new disk is unformatted and contains no files or filesystem. If this is the case, it also means that references to arrays don't apply, as there isn't one.

I really need more information about your server setup, as I'm just guessing at possible scenarios.
0
 
renniscomAuthor Commented:
Server A - I will check the RAM since I am noticing that the hard drive space has been vanishing

Server B - Apparently it had a bad hot swap drive, it was replaced, and now we are getting this message where the server does not even boot up.
0
Veeam and MySQL: How to Perform Backup & Recovery

MySQL and the MariaDB variant are among the most used databases in Linux environments, and many critical applications support their data on them. Watch this recorded webinar to find out how Veeam Backup & Replication allows you to get consistent backups of MySQL databases.

 
PerarduaadastraCommented:
If you remove the newly replaced drive does the server boot up again?

What RAID controller are you using, and what RAID level? Are the drives SCSI or IDE?
0
 
renniscomAuthor Commented:
I'll have to try later today. If I am not mistaken, its a RAID 5.

Is there another way to boot it up?
0
 
PerarduaadastraCommented:
If you're running RAID 5 then you will have a minimum of three drives in the array. If you only have two, then the array is either RAID 0 or RAID 1. If it's RAID 0 and a drive in it has failed then the array is permanently broken and any data on the remaining drive is useless because half the sectors containing data are on the dead drive. If it's RAID 1 then the array should rebuild by itself, though if the RAID controller is as old as the server then you may have to explicitly tell it to rebuild the array. Again, I need more information to be more specific in trying to help you.

If the drives are SCSI, is the new drive ID set to the same as that of the failed disk? Are there any terminators that need to be transferred from the old drive to the new one? If any of these criteria are needed and haven't been met then the array probably won't function at all. Are any error messages from the controller displayed during startup?
0
 
renniscomAuthor Commented:
I just rebooted it, this is what I get....


IMG00196-20101127-1253--2-.jpg
0
 
PerarduaadastraCommented:
Is this with or without the new drive installed?

You have an Intel SRCU42L integrated controller  in your server, which is looking for a hotfix drive, as per section 4.3.3 of this document:

ftp://download.intel.com/support/motherboards/server/srcu42l/tps.pdf

It seems that the term "hotfix" means different things to different people... in any case, the controller isn't seeing the drive. This may be caused by the firmware version of the new drive being different from that of the others, which Intel has flagged up as a possible issue here:

http://www.intel.com/support/motherboards/server/sb/CS-006152.htm

As your controller firmware isn't the latest and nor, I suspect, are your drivers, see this page:

http://www.intel.com/support/motherboards/server/srcu42l/sb/cs-007030.htm

which lists the available firmware and drivers for the controller.

It's also possible that Seagate may have more recent firmware for the drive itself, but you would need to input the serial number of the drive here:

https://apps1.seagate.com/rms_af_srl_chk/

to discover if this is the case.

Hope this helps.

0
 
PerarduaadastraCommented:
To amend one of my earlier comments, the SCSI ID of the new drive doesn't have to be the same as the failed one, but it does need to be different from those of the other drives in the array and that of the controller itself.

Going back to your question, was the drive that failed actually the hotfix spare?

The screenshot that you just posted shows four drives detected by the controller, which presumably are members of the array as they are all in the same LUN. It appears that the controller is configured to expect a hotfix spare to be present, and when one is not detected it doesn't bring up the host drive (the actual volume that the system sees) but reports a failure and waits to be told what to do via the Storage Console. The first link I posted to Intel's PDF on the controller has a lot of additional information; have a look at sections 4.3.4-6 as well and see if any of those scenarios are congruent with yours.

If the new drive is not detected, or prevents the array from working, then I would be looking at its firmware version to see if it differs from that of the other array member drives, and thinking about upgrading the firmware of the RAID controller.
0
 
renniscomAuthor Commented:
Your info has been very detailed and insightful and I appreciate it greatly!

I will be verifying the info you posted tomorrow.

Thank you again.
0
 
renniscomAuthor Commented:
I put the original drive back in, repaired array - rebooted. Updated firmware - we are back in business.

Thanks again!!
0
 
PerarduaadastraCommented:
My pleasure. I'm glad that you're up and running again.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 6
  • 6
Tackle projects and never again get stuck behind a technical roadblock.
Join Now