Solved

"Single-Bit ECC Errors in Memory Bank"

Posted on 2006-07-02
10
3,057 Views
Last Modified: 2007-12-19
At bootup I see the message "Single-Bit ECC Errors in Memory Bank", but I am able to proceed with booting and nothing seems wrong.

I assume that at least one of the banks of ram is bad (I think I have either 4 or 8 banks in there).

Should I pull one out at a time until I find the broken one (and then replace it) or is it safe to just ignore it? If I can ignore it, is the speed of the ram impacted by this at all?

I don't know if my type of memory corrects single bit errors or simply detects them.
0
Comment
Question by:HappyEngineer
  • 2
  • 2
  • 2
  • +4
10 Comments
 
LVL 32

Assisted Solution

by:jhance
jhance earned 100 total points
ID: 17026152
If the BIOS screen doesn't identify which RAM module is producing the error, then yes, the one-at-a-time removal method should help you identify the failing one.

ECC = Error CHECK and CORRECT.  This try DOES detect and correct single bit errors.  The other type, PARITY can only CHECK but since there is no correct capability, the system is halted.
0
 
LVL 14

Accepted Solution

by:
FriarTuk earned 100 total points
ID: 17026154
are these simms or dimms? does the mobo require matched ram in certain banks?

yes, you should try finding which one it is & remove it (one at a time)

but there are memory testers:
        http://www.memtest86.com/#download0
        http://www.simmtester.com/page/products/doc/docinfo.asp
        http://oca.microsoft.com/en/windiag.asp#top
0
 
LVL 6

Expert Comment

by:engineer_dell
ID: 17027091
ECC has the ability to correct a detected single-bit error in a 64-bit block of memory. When this happens, the computer will continue without a hiccup; it will have no idea that anything even happened. However, if you have a corrected error, it is useful to know this; a pattern of errors can indicate a hardware problem that needs to be addressed. Chipsets allowing ECC normally include a way to report corrected errors to the operating system, but it is up to the operating system to support this.

If a minor (one-bit) memory error occurs, the ECC logic will handle it. If a two-bit or larger error occurs in ECC memory, your system will be halted--similar to what happens with parity memory when any error is encountered.

refer this.
http://www.pcguide.com/ref/ram/errECC-c.html

Or you can download a third party memory testing software to find the fault
http://www.memtest.org/#downiso

Hope you find this helpful,
Engineer_Dell
0
 
LVL 44

Assisted Solution

by:scrathcyboy
scrathcyboy earned 100 total points
ID: 17028938
In general, it is NOT safe to ignore ANY RAM errror by windows.  Remove the chips and see if they have a CL posted on the RAM card.  If so, and it is CL3, it means your RAM is too slow for the chipset on the motherboard.  Most fast chipsets like NVidia need CL2.5 or faster.  Only the slowest chipsets can handle CL3, so if you have CL3 ram (check the net for your product number, if the CL is not printed on them), then you will get intermittent errors like you are getting.

This is NOT safe to proceed this way. You will get random seizures, failures, data loss, even corruption of the chipsets and eventual loss of the motherboard if you continue.  It is possible you have just one bad RAM chip, but most likely the RAM cannot perform to the latency requirements of the board.  It is not advisable to run with RAM too slow for the MB (slow has nothing to do with the PC xxx rating, BTW).
0
 
LVL 2

Assisted Solution

by:SaxicolousOne
SaxicolousOne earned 100 total points
ID: 17031798
Honestly, I wouldn't make blanket statements about the necessary CAS latency of memory modules without knowing what the system in question calls for. HappyEngineer, just check your system's (or specific motherboard's) recommendations as to the memory module specifications if this is something you're worried about. If you are concerned that your modules may be inappropriate for your system, you can have a look at exactly what memory you've got using CPU-Z...

http://www.cpuid.com/cpuz.php

... and compare that with the motherboard's stated requirements. If you can't find any stated requirements anywhere for your system, just go to http://www.kingston.com/ or some other memory manufacturer's site, look for THEIR recommended modules for your system, and have a look at those modules' specs.

If you want, you could also have a look in your BIOS and see what SPD timings (CAS, etc.) are supported by your system.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 44

Expert Comment

by:scrathcyboy
ID: 17033020
The inportance of a good CAS match to the MB chipset is one of THE most important issues for stability.  Single bit ECC errors in this case are more likely related to the MB chipset not liking that ECC Ram, but the CAS issue is still supremely relevant.  With the wrong CAS, this is what you can get with ECC.
0
 
LVL 12

Assisted Solution

by:GinEric
GinEric earned 100 total points
ID: 17039892
Error Correction for single bit errors is between the the two ends, the memory and the input/output registers.  It's generated at each end and checked.  Therefore, it is not simply a memory problem.  One Bit Error Correction is more designed to fix a bit that got lost somewhere in the middle, not in memory.

It could just as well be a bad gate on the motherboard as a memory card, in fact, it would be more likely, since the Error Correction Code is stored in the parity bits in memory.

If you don't know what RAS and CAS timing are, you should not play with them.  The Refresh timers will cause all kinds of errors if you tweak them out of specification.

Your memory is not on all the time, that's how modern memory works, to cut down on power consumption.  The RAS and CAS timers determine how long memory is allowed to be powered off and still retain memory of bits until the next refresh cycle.  During the refresh cycle, you cannot access that memory, the request goes into a wait queue for a few milliseconds.  This is masked by how memory is transferred in large blocks, so you hardly ever see any latency.  To see how this works, look up dynamic refresh memory.

A One Bit Error is generally an indication that something outside of memory lost a bit, not the memory itself.  Anything from Flourescent lights to power surges in other devices can cause the loss of a bit here or there.  An air conditioner switching on could do it.  But if it's always the same bit, either some circuit is weak or some cable is loose or dirty, or perhaps some chip has a bad solder joint.  It could even be a badly laser welded pin leg on a chip and merely walking by it could cause enough vibration to to temporarily disconnect the pin.  I've seen all this in production and in the laboratory, under a scanning electron microscope.  Have even seen bad runs of all sorts of IC chips that had bad laser welds.

If the power supply is getting weak, or the motherboard is getting too hot, it can start to show up as one bit errors.

Often, replacing the memory only delays the eventual catastrophe by masking the real problem.  Then, when the real component fails, it takes the data with it.  So, even if it works after replacing the memory, there is no guarantee that you have actually found the problem, and, it may come back to haunt you.

You should check all of your voltages, connections, clear the dust from the motherboard and fans, hard drives, etc., and visually inspect while doing so, even use your sense of smell to sniff out possible burned components.  Check all cabling, and press on the connectors to make sure they're firmly seated.  Aside from that, you can take out the memory cards and clean the lands, if you have the proper cleaner.

That's how it's done in the field by engineers.


0
 
LVL 14

Expert Comment

by:FriarTuk
ID: 17054515
0
 

Author Comment

by:HappyEngineer
ID: 17055483
This is all useful information, but for some reason the message stopped coming up during booting. I've been waiting to see if it reoccurs, but so far it hasn't. I obviously can't go in and change things in order to fix it if it isn't giving me any messages to indicate that it's still happening, so I guess I'll close the question.

I tried memtest86, but it didn't find any problems and I haven't seen the error during bootup since I ran it.
0
 
LVL 12

Expert Comment

by:GinEric
ID: 17055581
It may have been heat or any condition.  One bits are flakey, but it will return one day, count on it.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

More and more people are using the enhanced small form-factor pluggable (SFP+) tranceivers, and speed is of utmost importance. Testing of speeds are critical to ensure that the devices will meet the speed requirements. There are some testing challen…
Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now