Solved

HP SBS 2008 server rebooting

Posted on 2010-09-02
16
1,325 Views
Last Modified: 2012-05-10
Deat Experts -
Got a horrible one and would like to know if anyone elase has experienced this
2 x clients (Client 1 and 2) and the issue started at the same time
2 x HP ML350 G6 in RAID 1 for OS and RAID 5 for data
Both have Windows 2008 SBS SP2 fully patched including Exchange 2007 SP2 Update 4 and SQL 2005 SP3 post patches
Both use BES Express for BB enterprise
Both wth ESET mail server AV and we have past the terrible issue that ESET gave its wordwide users on the 2nd sept
http://www.thinq.co.uk/2010/9/2/eset-nod32-antivirus-pains/
http://kb.eset.com/esetkb/index?page=content&id=NEWS99

Both servers started to reboot about 14 days ago. It has become increasngly common everyday. Reboots with a USB external bakup drive attached or not.
I thought I had cracked it by finding a HP tech page stating that firmware BIOS 15th May to the D22 version were required due to memory instability on the DIMMS (Both clients have 8Gb) so I installed all the HP PSP firmware and software updates to no avail.
HP ASR is switched on so forces the reboots. I will find out tomorrow how to switch it off so I can see how and why the server hangs
Also I have to use the resource kit to analyse the minidump created in system32 everytime a reboot occurs and see if that gives me any clues.
http://support.microsoft.com/kb/315263

Its a bad one.
I have a client (Client 3) who has a HP DL380 with exactly the same config minus the BES Express and his server is fine with USB drives attached and detached
But I don't think its BES Express as Client 1 has been running BES since Feb and Client 2 BES since July
Any help appreciated
Regards
Rob
0
Comment
Question by:RobKanj
  • 8
  • 5
  • 2
  • +1
16 Comments
 
LVL 35

Expert Comment

by:Cris Hanna
ID: 33593644
Have you run memtest...in the vast majority of cases with constant reboots...faulty memory.   you are running ECC memory in those servers..correct?
 
0
 
LVL 76

Expert Comment

by:Alan Hardisty
ID: 33594017
I had a customer with a brand new ML350 G6 which began rebooting randomly.  It turned out to be the RAID Controller Memory Module.
I spent 2 hours talking to HP and then they send and engineer to replace the motherboard, which didn't work, then they came back the next day with a memory module for the RAID controller and that fixed it.
0
 
LVL 15

Expert Comment

by:Dave_AND
ID: 33594441
Have you got the "HP System Management Homepage" installed and working? this will give you a good indication to which hardware is faulty.
0
 

Author Comment

by:RobKanj
ID: 33597043
Dear Chris, alan and Dave
a) How do I do a MEM check? I pressed F9 went in to the RBSU and was able to disable ASR. Yes I have 4 x 2Gb ECC DIMM's. Could not find the memory check in any of the F9 list
b) About to ring HP now.
c) HP System Management Hoome page all installed and workign with the latest PSP8.5 updates, The only exclamation mark I have is the iLO due to non configuration. I have not enabled it and it has a IP address of 0.0.0.0.
Will keep you all updated.
Best Rob
 
 
0
 
LVL 15

Expert Comment

by:Dave_AND
ID: 33597157
http://www.ultimatebootcd.com/ is good for testing RAM (and other parts) and http://hcidesign.com/memtest/ is good for in windows.

Ususaly even bad RAM will show in HP Software, if not boot off the smart start CD, and run a Hardware test, this can uncover stuff too. (you will need 8.3 because its broken in 8.4.. or at least when I downloaded the CD last time it was)
0
 
LVL 76

Expert Comment

by:Alan Hardisty
ID: 33597251
How old is the server?  Is it under the standard HP Warranty?  If so - call HP and troubleshoot it with them.
Have you installed the ProLiant Support Pack and thus all the necessary diagnostics tools from HP?
0
 

Author Comment

by:RobKanj
ID: 33604386
Hi all
Some updates:
Yes I had the Proliant support pack installed and ran diagnostics - all fine.
Logged call with HP and they have advised to install HP firmware update 9.10c released 25th August 2010...see link
http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=3884315&swItem=MTX-6eea1331b05640b68adaa764dd&prodNameId=3884316&swEnvOID=4024&swLang=13&taskId=135&mode=4&idx=1
I also asked them whether I had my 4 x 2GB DIMM;s installed in the correct bays and all confirmed OK.
I also found this
http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?contentGroup=NT_DT_CustomerAdvisory&lang=en&cc=us&docIndexId=64174&taskId=135&prodTypeId=15351&prodSeriesId=3884315
I will be installing HP firmware update 9.10c and also completing the memory checks at both sites tomorrow. If the server still eboots after the update, HP have provided a link to upload various logs and an engineer will be sent out with new parts (depending on what the logs say!)
Thanks so far
0
 
LVL 76

Expert Comment

by:Alan Hardisty
ID: 33604464
There are some Firmware updates that the PSP won't install.  I had to install a couple on the server they assisted me with - sadly it didn't help me due to dodgy memory.
Fingers crossed HP can pinpoint the problem.
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 

Author Comment

by:RobKanj
ID: 33618354
Hi All,
HP have exhausted their troubleshooting and after running various diagnostics reports and uploading their HPSReport collation - all hardware is working at 100%

So I turned my attention to Microsoft and logged a PSS call.
Again lots of uploaded logs and perfmon counters. His first suggestion today is to update the drivers listed in the attached spreadsheet. He said to go to the individual manufactures website and download the relevant drivers and update. (Please see spreadsheet)
a) does this sound reasonable
b) how would I find each hardware component within windows to update the drivers to?

CHAPTER-UPDATE-DRIVERS.xlsx
0
 
LVL 76

Expert Comment

by:Alan Hardisty
ID: 33618416
That a very big list of drivers and lots for the same thing e.g., RAID etc - is that what is supposedly installed on your server, or just a list of random drivers that you might have one or two of that might need updating?
0
 

Author Comment

by:RobKanj
ID: 33618718
Apparently these are all the drivers on the server. The tecnician extracted them from one of his reports.
As you know HP publish the latest drivers on their website and its part of the PSP and all of this has been applied.
I'm guessing the Brother, adaptec, LSI drivers are the ones I need to go searching for and install........?
0
 

Author Comment

by:RobKanj
ID: 33658687
Quick udpdate - HP are 99% sure its not a hardware / driver / support issue. Have logged the call with MS...they took a whole bunch of logs but have been a bit slow in getting the crash dump file.
To generate a crash dump file is a server locks up and CTRL+ALT+DEL does not respond. Press the CTRL key on the right side of the keyboard and scroll button twice. This forces the crashdump to download to C:\windows (default path). Its been uploaded to them via FTP and hopefully they will get back to me in the next two days. Both servers are crashing....
 
0
 
LVL 76

Expert Comment

by:Alan Hardisty
ID: 33664033
Hopefully MS will come back with something useful soon.  I am still tuned in - will wait for some feedback.
Alan
0
 

Author Comment

by:RobKanj
ID: 33681320
Now MS says its a sharepoint issue....onetutil.dll
will keep you all posted
0
 

Author Comment

by:RobKanj
ID: 33784865
still going on...still rebooting...MS have tried 4 different avenues and still no joy...meanwhilst have build a secind server and will probably migrate users this weekend................
0
 

Accepted Solution

by:
RobKanj earned 0 total points
ID: 33886458
I ffound the solution by examining the crash dump files after enabling the NMI switch.
ESET Mail security 4.2 conflicts with windows update patches installed during August 2010 (when the problem started) and the epfwwfpr.sys file within c:\windows\system32\drivers casues the server to crash.
This sys file needs to be renamed in safe mode and there is an ongoing call between ESET and Microsoft on whose fault it is.
Resolved
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

More or less everybody in the IT market understands the basics of Networking, however when we start talking about Storage Networks, things get a bit dizzier, and this is where I would like to help.
Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now