HP 360 G5 Errors

CHI-LTD
CHI-LTD used Ask the Experts™
on
Hi All

Got the following on an HP 360 G5 box over the weekend by email:
Event Name: (SNMP) Advanced ECC Memory Engaged (6052)
URL: http://server/domain:280/mxportal/MxContextLaunch.jsp?systems=server&tool=System%20Page
Event originator: 'server'
Event Severity: Major
Event received: 02-Jul-2010, 21:41:35
 
Event description: Advanced Memory Protection Advanced ECC Memory Engaged.  The Advanced Memory Protection subsystem has detected a memory fault. Advanced ECC has been activated.  User Action: Replace the faulty memory.

and on ILO:
Internal Health LED:   Degraded

and on HP SIM:
Event Details: (SNMP) Corrected \ uncorrectable Memory Errors  Replace Memory Module. (6064)
Event Identification and Details
Event Severity   Minor
Cleared Status Not cleared
Event Source server
Associated System server
Associated System Status   Normal
Event Time Fri, 02/07/2010, 21:41 BST
Description Corrected \ uncorrected Memory Errors Detected The errors have been corrected, but the memory module should be replaced. Value 0 for CPU means memory is not Processor based
Event Category Unassigned
Assignee    
Comments    

Trap Details
Variable Description Value
NO DATA  
An administratively-assigned name for this managed node. By convention, this is the node``s fully-qualified domain name. server  
The Trap Flags. This is a collection of flags used during trap delivery. Each bit has the following meaning: Bit 5-31: RESERVED: Always 0. Bit 2-4: Trap Condition 0= Not used (for backward compatibility) 1= Condition unknown or N/A 2= Condition ok 3= Condition degraded 4= Condition failed 5-7= reserved Bit 1: Client IP address type 0= static entry 1= DHCP entry Bit 0: Agent Type 0= Server 1= Client NOTE: bit 31 is the most significant bit, bit 0 is the least significant. 0  
The slot in which the memory board or cartridge is installed. A value of 0 indicates memory installed directly on the system board. 0  
The memory module CPU number. Value 0 means memory is not Processor based. 0  
The memory module rasier number. 0  
The memory module number. 5  
The memory module``s manufacturer part number. This field will be a null (size 0) string if the manufacturer part number is not available.  
Module memory size in kilobytes. A kilobyte of memory is defined as 1024 bytes. A size of 0 indicates the module is not present. 1048576  
A Server System ID. This value is used to uniquely identify systems via a unique ID on systems that do not support the EISA bus. CPQ0763  

Mib Information
The associated MIB File Name for this trap is cpqhlth.mib and the MIB identifier CPQHLTH-MIB  


we had some windows updaets to apply (during the day) and restarted the a number of servers.  but would like to know if this is a genuine fault with the RAM or is there any diags i can run to verify these errors?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2014
Commented:
It means that the amount of times ECC has been required to get the data off the DIMM has exceeded the threshold and that you have a flakey DIMM, it's a bit like a S.M.A.R.T. error on a disk. Rather than looking at it through SIM or SNMP look at the integrated management log via the systems management homepage on the particular server, then you can see the history easier.

Author

Commented:
logged with HP and got replacement DIMM from them so 100% fixed now!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial