Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Hardware Error on Sun Blade 100

Posted on 2001-07-17
4
Medium Priority
?
995 Views
Last Modified: 2010-04-29
Help!
I got some errors like this one:
warning: uncorrectable error from pci0(upa mid0) during DVMA write transaction byte mask is ff.
ASFR=240000ff.00000000 AFAR=00000000.6c6db180
double word offset=0,memeory module Dimm4 id 31
secondary error from DVMA write transactiom

panic[cpu0]/thread=2a10001fd40: Fatal PCI UE ERROR

Computer reboots
It's difficult to track the problem, because my system runs for 2 or 3 days fine, than I get such errors following each other (2 till 5 times), later it's ok for some next days.


My configuration
SB100 with
1024 MB RAM
            MICRON
            MT18LSDT3272AG-133B1 PC133U-333-542-A
            SG CBNAKKA005 200114
            256MB, SYNCH, 133Mhz, CL3, ECC
 
2xHD nr 1 standard 15 GB (was shipped with)
nr 2 40 GB MAXTOR

SOFTWARE: Solaris 04/01 with recomended Patches for solaris 8 date 11.07.2001

Oracle 9i RDBMS with 2 databases

output of /usr/platform/sun4u/sbin/prtdiag follows:

# /usr/platform/sun4u/sbin/prtdiag
System Configuration: Sun Microsystems sun4u Sun Blade 100 (UltraSPARC-IIe)
System clock frequency: 84 MHZ
Memory size: 1GB

==================================== CPUs ====================================
E$ CPU CPU Temperature
CPU Freq Size Impl. Mask Die Ambient
--- -------- ---------- ------ ---- -------- --------
0 502 MHz 256KB US-IIe 1.4 79 C 35 C

================================= IO Devices =================================
Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---- -------------------------------- ----------------------
0 pci 33 7 isa/dma-isadma (dma)
0 pci 33 7 isa/serial-su16550 (serial)
0 pci 33 7 isa/serial-su16550 (serial)
0 pci 33 8 sound-pci10b9,5451.10b9.5451.1 (+
0 pci 33 12 network-pci108e,1101.1 (network) SUNW,pci-eri
0 pci 33 12 firewire-pci108e,1102.1001 (fire+
0 pci 33 13 ide-pci10b9,5229.c3 (ide)
0 pci 33 19 SUNW,m64B (display) ATY,RageXL

============================ Memory Configuration ============================
Segment Table:
-----------------------------------------------------------------------
Base Address Size Interleave Factor Contains
-----------------------------------------------------------------------
0x0 256MB 1 Label DIMM0
0x20000000 256MB 1 Label DIMM1
0x40000000 256MB 1 Label DIMM2
0x60000000 256MB 1 Label DIMM3

=============================== usb Devices ===============================

Name Port#
------------ -----
mouse 2
keyboard 4
#

If it helps I can send crash core files.
But they are 87 and 88 MB large!



Any suggestions?
thanx.
 


Iouri
 
0
Comment
Question by:bespalov
  • 3
4 Comments
 
LVL 13

Expert Comment

by:magarity
ID: 6290300
"double word offset=0,memeory module Dimm4 id 31"

This line is the kicker; it indicates an ECC fault with DIMM #4.  Replace this stick and the problem should go away.  If you have logs of these errors, double-check that it is always DIMM #4 that is the troublemaker.

regards,
magarity
0
 
LVL 13

Expert Comment

by:magarity
ID: 6290324
PS - Yes, ECC is supposed to correct errors in memory, but it only corrects single bit errors.  This error message indicates that ECC is failing because more than one bit is incorrect.

Oh, and I don't know if Sun starts numbering the DIMMs at 0 or 1 so read the PCB closely as it should be labeled somewhere.
0
 

Author Comment

by:bespalov
ID: 6294960
Hi magarity,
it doesn't help. If i replace this dimm, I get this error on one other, but it is the last one all the time.
Something else - if I don't start the databases I do not get errors.
If I start all Databases I have only 30-50 MB RAM left.
0
 
LVL 13

Accepted Solution

by:
magarity earned 800 total points
ID: 6295415
Well, a couple of observations:
1. That error message means a parity error in the RAM.
2. Starting the databases makes heavy use of all the RAM.

Conclusion:
There is a problem somewhere in either the RAM or the RAM controller logic.  It would be too easy if you had an identical machine whose memory you could swap and see if the problem follows the RAM...?

My primary source has been Sun Manager's mailing list archives.  This person has the exact same error message:
http://www.sunmanagers.org/pipermail/sunmanagers/2001-January/000832.html
and here is the final message in the thread where the solution was revealed:
http://www.sunmanagers.org/pipermail/sunmanagers/2001-January/000872.html

A couple of other posters in that forum had similar problems and were all solved by replacing the one offending stick explicitly given in the text of the error message, thus my original diagnosis.  Only the one quoted above had to replace all of his memory.

I realize that memory for Sun servers isn't exactly cheap to replace.  Is it warranted?

Good luck!
magarity
0

Featured Post

Ask an Anonymous Question!

Don't feel intimidated by what you don't know. Ask your question anonymously. It's easy! Learn more and upgrade.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are many software programs on offer that will claim to magically speed up your computer. The best advice I can give you is to avoid them like the plague, because they will often cause far more problems than they solve. Try some of these "do it…
This article will show how Aten was able to supply easy management and control for Artear's video walls and wide range display configurations of their newsroom.
Finding and deleting duplicate (picture) files can be a time consuming task. My wife and I, our three kids and their families all share one dilemma: Managing our pictures. Between desktops, laptops, phones, tablets, and cameras; over the last decade…
Want to learn how to record your desktop screen without having to use an outside camera. Click on this video and learn how to use the cool google extension called "Screencastify"! Step 1: Open a new google tab Step 2: Go to the left hand upper corn…
Suggested Courses
Course of the Month8 days, 19 hours left to enroll

876 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question