Solved

Errors with Server Hardware

Posted on 2011-02-18
12
1,191 Views
Last Modified: 2012-05-11
2 identical servers - Server A and Server B. Server A keeps giving me the blue screen for driver error and the screen I attached below. This happens at least once a week. Is it a RAM issue?

 Server B had an issue with the hot swap drive. Drive taken out, replaced and now we get error "no boot filename received". See attached.

Needless to say...HELP!
ServerA.jpg
0
Comment
Question by:renniscom
  • 6
  • 6
12 Comments
 

Author Comment

by:renniscom
ID: 34930290
0
 
LVL 15

Expert Comment

by:Perarduaadastra
ID: 34930524
The first thing I notice is that the servers seem to be rather elderly, if the BIOS dates are anything to go by. The second is that the BIOS version is the first production release (P01) if the board follows the Intel convention of the time. Upgrading to the latest revision might be useful at some point in the proceedings, but be aware that there can be caveats when updating the BIOS over a large number of revisions; always check the release notes that Intel provide with each update in case, for example, version 7 has to be installed before you can update to version 11.

The server A problem does look like faulty RAM, which the system has marked as such, and should be fixed before investigating the driver problem, as dodgy memory can compromise even the best behaved drivers. Is it always the same driver that is implicated, and if so which one is it, and what is the STOP error and heading?

The error on server B is a little simpler - the system can't find a boot device! The boot filename is needed for booting from the network rather than a local disk (or disk array) and by default is the last option in the boot device order. This means that it has tried to boot from the floppy drive, the optical drive, and the hard disk (or disk array) without success; when the last chance network saloon fails to provide boot files the system just gives up.

If the drives are hot-swap, go into the RAID adapter embedded configuration utility (accessible during the POST by pressing a key combination such as Ctrl-A or Ctrl-R   - the exact key combination will be displayed on screen) and find out what's going on with the drive(s) in the array.

I've just re-read the question, and I note that you refer to "the" hot-swap drive, implying singular; if there is only one drive, and you've replaced it with a new one, that would explain the boot failure, as a new disk is unformatted and contains no files or filesystem. If this is the case, it also means that references to arrays don't apply, as there isn't one.

I really need more information about your server setup, as I'm just guessing at possible scenarios.
0
 

Author Comment

by:renniscom
ID: 34931186
Server A - I will check the RAM since I am noticing that the hard drive space has been vanishing

Server B - Apparently it had a bad hot swap drive, it was replaced, and now we are getting this message where the server does not even boot up.
0
 
LVL 15

Expert Comment

by:Perarduaadastra
ID: 34932387
If you remove the newly replaced drive does the server boot up again?

What RAID controller are you using, and what RAID level? Are the drives SCSI or IDE?
0
 

Author Comment

by:renniscom
ID: 34933016
I'll have to try later today. If I am not mistaken, its a RAID 5.

Is there another way to boot it up?
0
 
LVL 15

Expert Comment

by:Perarduaadastra
ID: 34933089
If you're running RAID 5 then you will have a minimum of three drives in the array. If you only have two, then the array is either RAID 0 or RAID 1. If it's RAID 0 and a drive in it has failed then the array is permanently broken and any data on the remaining drive is useless because half the sectors containing data are on the dead drive. If it's RAID 1 then the array should rebuild by itself, though if the RAID controller is as old as the server then you may have to explicitly tell it to rebuild the array. Again, I need more information to be more specific in trying to help you.

If the drives are SCSI, is the new drive ID set to the same as that of the failed disk? Are there any terminators that need to be transferred from the old drive to the new one? If any of these criteria are needed and haven't been met then the array probably won't function at all. Are any error messages from the controller displayed during startup?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:renniscom
ID: 34933220
I just rebooted it, this is what I get....


IMG00196-20101127-1253--2-.jpg
0
 
LVL 15

Accepted Solution

by:
Perarduaadastra earned 500 total points
ID: 34934006
Is this with or without the new drive installed?

You have an Intel SRCU42L integrated controller  in your server, which is looking for a hotfix drive, as per section 4.3.3 of this document:

ftp://download.intel.com/support/motherboards/server/srcu42l/tps.pdf

It seems that the term "hotfix" means different things to different people... in any case, the controller isn't seeing the drive. This may be caused by the firmware version of the new drive being different from that of the others, which Intel has flagged up as a possible issue here:

http://www.intel.com/support/motherboards/server/sb/CS-006152.htm

As your controller firmware isn't the latest and nor, I suspect, are your drivers, see this page:

http://www.intel.com/support/motherboards/server/srcu42l/sb/cs-007030.htm

which lists the available firmware and drivers for the controller.

It's also possible that Seagate may have more recent firmware for the drive itself, but you would need to input the serial number of the drive here:

https://apps1.seagate.com/rms_af_srl_chk/

to discover if this is the case.

Hope this helps.

0
 
LVL 15

Expert Comment

by:Perarduaadastra
ID: 34935050
To amend one of my earlier comments, the SCSI ID of the new drive doesn't have to be the same as the failed one, but it does need to be different from those of the other drives in the array and that of the controller itself.

Going back to your question, was the drive that failed actually the hotfix spare?

The screenshot that you just posted shows four drives detected by the controller, which presumably are members of the array as they are all in the same LUN. It appears that the controller is configured to expect a hotfix spare to be present, and when one is not detected it doesn't bring up the host drive (the actual volume that the system sees) but reports a failure and waits to be told what to do via the Storage Console. The first link I posted to Intel's PDF on the controller has a lot of additional information; have a look at sections 4.3.4-6 as well and see if any of those scenarios are congruent with yours.

If the new drive is not detected, or prevents the array from working, then I would be looking at its firmware version to see if it differs from that of the other array member drives, and thinking about upgrading the firmware of the RAID controller.
0
 

Author Comment

by:renniscom
ID: 34939353
Your info has been very detailed and insightful and I appreciate it greatly!

I will be verifying the info you posted tomorrow.

Thank you again.
0
 

Author Comment

by:renniscom
ID: 34946982
I put the original drive back in, repaired array - rebooted. Updated firmware - we are back in business.

Thanks again!!
0
 
LVL 15

Expert Comment

by:Perarduaadastra
ID: 34947287
My pleasure. I'm glad that you're up and running again.
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Scenerio: You have a server running Server 2003 and have applied a retail pack of Terminal Server Licenses.  You want to change servers or your server has crashed and you need to reapply the Terminal Server Licenses. When you enter the 16-digit lic…
Setting up a Microsoft WSUS update system is free relatively speaking if you have hard disk space and processor capacity.   However, WSUS can be a blessing and a curse. For example, there is nothing worse than approving updates and they just have…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now