[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1132
  • Last Modified:

HP Proliant DL580 G2 with Windows Server 2003 Enterprise Edition SP2 reboots periodically

I have a Proliant DL580 G2 that keeps going into ASR shutdowns.  I've dealt with Microsoft Premier Support and they state it's not their issue as I can't get the server to generate a memory dump when it locks up.  As per their line of reasoning, the issue is hardware-related.  The SmartStart diagnostics reveal no issues on any hardware in the server, and I ran the full test.  All drivers/firmware are current as well.  In the process of attempting to fix this issue, I've replaced everything in the server save for the SCSI backplane, array controller and the power supplies.  The server's temperature is stable and not elevated at this point, nor is it anywhere close to being elevated.  Additionally, the WMI Performance Adapter service, which is stopped and the startup type is set to manual, keeps stopping and restarting.  The server in question is a DR box; the production server has identical hardware to the problem box but doesn't exhibit the same problems.  Both servers have VMWare GSX installed on them and run two VMWare guest server sessions apiece.  I'm also looking into what could be causing the WMI Performance Adapter service to keep stopping and starting, but the production (working) server has the same symptoms but it doesn't lock up.  Last, I've updated the symevent.sys file as the server has Symantec Antivirus installed.  If anyone could offer some insight as to what I could try next, I'd appreciate it.
0
jgerstner74
Asked:
jgerstner74
3 Solutions
 
BogdanSUACommented:
Before you try anything, verify that the BIOS and firmware levels on all your components are good.

During a scheduled maintenance window, shutdown both boxes.

Remove the disks from each box (but label them and notate their position).
Take the disk from your DR box and pop it into your production box.  
See if it reboots, crashes, etc.
If so, you have a software related problem.
If not, it could be a hardware related problem.

If the above didn't (and it shouldn't) cause any problems when booting up, pop in the drives from your production server into your DR box.

Does the problem reoccur?  
If so, you have a hardware problem.  Maybe a bad power supply....do you have dual power supplies?  Maybe its the juice coming from the wall.....Is at least one of them plugged into a UPS?
0
 
jgerstner74Author Commented:
The servers are approximately 300 miles apart from each other in different data centers, so I don't think I can try swapping the hard drives.  I apologize for not mentioning this previously.  The BIOS as well as all drivers and firmware are current on the problem server.  I will look into replacing the power supplies as your suggestion is in line with suggestions I've found elsewhere as well.  Thank you for the suggestion.
0
 
BogdanSUACommented:
No problem, but you still have some other options while you wait for the power supplies.  

Download the memtest.iso, burn it to a CD (or use ILO if you've got it) and let it go to town on your hardware.  If it reboots/crashes, then you've got a HW problem.

BTW, do you have a UPS connected to the server?  It could be that the power supply(s) are fine, but the electricity dips down for a split second to unacceptable voltage levels.  Do you have any other equipment in the room connected to the same source (power strip/PDU) that might tell you if it rebooted.

LOL - Some humor - Dust off that VCR, set it's time, plug it in, and see if it starts blinking 12:00.  :)
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
omic_adminCommented:
check the UPS and make sure that it is also conditioning the lines. I've seen issues with equipment rebooting or shutting down periodically when the power is not clean.
0
 
markzzCommented:
Are you sure this is an ASR. As in HP ASR??
If so it will log info into the hardware logs indicating what has happened.
You can view this via https://hostname:2381/
You can also set your ASR timeout in BIOS to increase or decrease the time out limit. A true ASR will occur if the hardware doesn't revieve a poll from the software driver.
The exception to this is if you have a CPU issue with CPU1 (or 0 depending on how you count them) but the first CPU as this CPU is used by the system for monitoring etc..
0
 
jgerstner74Author Commented:
After checking with the asset group at my organization, we've decided t replace the server with a new one.  Nonetheless, thank you all for your input.  I've tried to split the points as evenly as I can, but as 250/3=83.33 and I can't award points in decimal points, I gave the extra point to BogdanUSA.  
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now