Learn how to a build a cloud-first strategyRegister Now


Faulty H/W components on Unix/Linux servers?

Posted on 2011-10-09
Medium Priority
Last Modified: 2012-05-12
Friends Hope all doing well.  Could you please provide me the related risk involved on day-to-day server operation to these following h/w faulty components on Unix/Linux servers?

Please kindly provide me details information and how they are related. Willing to open another question for more points.

Faulty Hard Disk
Faulty CPU  
Faulty Memory DIMM  
Faulty Power supply
Faulty CPU Fan
Faulty SAN Card ( HBA) replacement
Faulty NIC card  

I am relatively new to this area and trying to understand what /why?

Thanks OM
Question by:Oramcle
  • 2
  • 2
LVL 99

Assisted Solution

by:John Hurst
John Hurst earned 200 total points
ID: 36939085
Assuming your Server is business-critical, keep it under maintenance and have skilled technicians service it to replace parts. Parts normally want to be exact replacements.

For Faulty Hard Disk, use RAID 5. Assuming you use a solid RAID configuration and have good hardware, a bad disk will show up either in RAID management with errors or as a Red Light on the drive if serious enough. Either way, you get an exact replacement and the drives are normally user serviceable. Pull out ONE bad drive, replace it and rebuild the array. Let us hope you have good backups and do not have 2 drives at once going out.

... Thinkpads_User

Author Comment

ID: 36939263
Thank you Thinkpads_User,  i am looking for list of risk factors for each Hardware components.
LVL 99

Expert Comment

by:John Hurst
ID: 36939825
I don't think there is much in the way of risk factors for each component. These all have mean time between failure numbers (or should have). See manufacturers' specifications for MTBF numbers. But these are just statistics and mean nothing for individual parts. They can fail anytime and you cannot predict. ... Thinkpads_User

Accepted Solution

eager earned 1800 total points
ID: 36943178
Generally, things with moving parts have higher failure rates than things without moving parts.  But that is, as said, a generality.  A hard drive designed for server applications may have much longer MTBF than a power supply targeted for cheap desktops and built with marginal parts.

Risk factors:
Hard drives wear out over time.  Noise and vibration can result in damage to the disk surface.  (Use SMART monitoring to track HD condition).  
Power supplies have fans and capacitors which fail, especially if stressed close to their specifications.
Motherboards or added cards can have capacitors which fail if overstressed or poor quality.
Semiconductors (CPUs, DIMMs, etc.) have very long MTBF, unless overstressed by heat or higher voltage.  This can be caused by other components which become marginal without failing, for example, a fan which doesn't move enough air over the CPU or filters which become clogged by dust.  Bad capacitors can allow voltage spikes to reach sensitive components which eventually cause them to fail.  

On a day-to-day basis, a well designed server provided by a reputable manufacturer has MTBF in the tens of thousands of hours.  You can probably expect more failures in hard drives and fans than in the other components.  As @thinkpadsads_user suggests, configuring your hard drives in a RAID configuration will compensate for this and allow faster recovery.  Other than hard drives, I think that most failures will appear to be random.  A company like Google which has perhaps a million servers, will see failure patterns.  For an individual server, there's too much individual variability.

If you require high availability (see http://en.wikipedia.org/wiki/High_availability) you can use a server with multiple power supplies, multiple fans, redundant NICs, CPUs configured to fail over, etc.  

Author Closing Comment

ID: 36948160
Thank you all.

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension (http://www.experts-exchange.com/discussions/210281/Attachments-with-no-extension.html). This reminded me of questions tha…
Often times it's very very easy to extend a volume on a Linux instance in AWS, but impossible to shrink it. I wanted to contribute to the experts-exchange community a way of providing a procedure that works on an AWS instance. It can also be used on…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.
Suggested Courses
Course of the Month21 days, 5 hours left to enroll

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question