"Single-Bit ECC Errors in Memory Bank"

At bootup I see the message "Single-Bit ECC Errors in Memory Bank", but I am able to proceed with booting and nothing seems wrong.

I assume that at least one of the banks of ram is bad (I think I have either 4 or 8 banks in there).

Should I pull one out at a time until I find the broken one (and then replace it) or is it safe to just ignore it? If I can ignore it, is the speed of the ram impacted by this at all?

I don't know if my type of memory corrects single bit errors or simply detects them.
Who is Participating?
FriarTukConnect With a Mentor Commented:
are these simms or dimms? does the mobo require matched ram in certain banks?

yes, you should try finding which one it is & remove it (one at a time)

but there are memory testers:
jhanceConnect With a Mentor Commented:
If the BIOS screen doesn't identify which RAM module is producing the error, then yes, the one-at-a-time removal method should help you identify the failing one.

ECC = Error CHECK and CORRECT.  This try DOES detect and correct single bit errors.  The other type, PARITY can only CHECK but since there is no correct capability, the system is halted.
ECC has the ability to correct a detected single-bit error in a 64-bit block of memory. When this happens, the computer will continue without a hiccup; it will have no idea that anything even happened. However, if you have a corrected error, it is useful to know this; a pattern of errors can indicate a hardware problem that needs to be addressed. Chipsets allowing ECC normally include a way to report corrected errors to the operating system, but it is up to the operating system to support this.

If a minor (one-bit) memory error occurs, the ECC logic will handle it. If a two-bit or larger error occurs in ECC memory, your system will be halted--similar to what happens with parity memory when any error is encountered.

refer this.

Or you can download a third party memory testing software to find the fault

Hope you find this helpful,
Protect Your Employees from Wi-Fi Threats

As Wi-Fi growth and popularity continues to climb, not everyone understands the risks that come with connecting to public Wi-Fi or even offering Wi-Fi to employees, visitors and guests. Download the resource kit to make sure your safe wherever business takes you!

scrathcyboyConnect With a Mentor Commented:
In general, it is NOT safe to ignore ANY RAM errror by windows.  Remove the chips and see if they have a CL posted on the RAM card.  If so, and it is CL3, it means your RAM is too slow for the chipset on the motherboard.  Most fast chipsets like NVidia need CL2.5 or faster.  Only the slowest chipsets can handle CL3, so if you have CL3 ram (check the net for your product number, if the CL is not printed on them), then you will get intermittent errors like you are getting.

This is NOT safe to proceed this way. You will get random seizures, failures, data loss, even corruption of the chipsets and eventual loss of the motherboard if you continue.  It is possible you have just one bad RAM chip, but most likely the RAM cannot perform to the latency requirements of the board.  It is not advisable to run with RAM too slow for the MB (slow has nothing to do with the PC xxx rating, BTW).
SaxicolousOneConnect With a Mentor Commented:
Honestly, I wouldn't make blanket statements about the necessary CAS latency of memory modules without knowing what the system in question calls for. HappyEngineer, just check your system's (or specific motherboard's) recommendations as to the memory module specifications if this is something you're worried about. If you are concerned that your modules may be inappropriate for your system, you can have a look at exactly what memory you've got using CPU-Z...


... and compare that with the motherboard's stated requirements. If you can't find any stated requirements anywhere for your system, just go to http://www.kingston.com/ or some other memory manufacturer's site, look for THEIR recommended modules for your system, and have a look at those modules' specs.

If you want, you could also have a look in your BIOS and see what SPD timings (CAS, etc.) are supported by your system.
The inportance of a good CAS match to the MB chipset is one of THE most important issues for stability.  Single bit ECC errors in this case are more likely related to the MB chipset not liking that ECC Ram, but the CAS issue is still supremely relevant.  With the wrong CAS, this is what you can get with ECC.
GinEricConnect With a Mentor Commented:
Error Correction for single bit errors is between the the two ends, the memory and the input/output registers.  It's generated at each end and checked.  Therefore, it is not simply a memory problem.  One Bit Error Correction is more designed to fix a bit that got lost somewhere in the middle, not in memory.

It could just as well be a bad gate on the motherboard as a memory card, in fact, it would be more likely, since the Error Correction Code is stored in the parity bits in memory.

If you don't know what RAS and CAS timing are, you should not play with them.  The Refresh timers will cause all kinds of errors if you tweak them out of specification.

Your memory is not on all the time, that's how modern memory works, to cut down on power consumption.  The RAS and CAS timers determine how long memory is allowed to be powered off and still retain memory of bits until the next refresh cycle.  During the refresh cycle, you cannot access that memory, the request goes into a wait queue for a few milliseconds.  This is masked by how memory is transferred in large blocks, so you hardly ever see any latency.  To see how this works, look up dynamic refresh memory.

A One Bit Error is generally an indication that something outside of memory lost a bit, not the memory itself.  Anything from Flourescent lights to power surges in other devices can cause the loss of a bit here or there.  An air conditioner switching on could do it.  But if it's always the same bit, either some circuit is weak or some cable is loose or dirty, or perhaps some chip has a bad solder joint.  It could even be a badly laser welded pin leg on a chip and merely walking by it could cause enough vibration to to temporarily disconnect the pin.  I've seen all this in production and in the laboratory, under a scanning electron microscope.  Have even seen bad runs of all sorts of IC chips that had bad laser welds.

If the power supply is getting weak, or the motherboard is getting too hot, it can start to show up as one bit errors.

Often, replacing the memory only delays the eventual catastrophe by masking the real problem.  Then, when the real component fails, it takes the data with it.  So, even if it works after replacing the memory, there is no guarantee that you have actually found the problem, and, it may come back to haunt you.

You should check all of your voltages, connections, clear the dust from the motherboard and fans, hard drives, etc., and visually inspect while doing so, even use your sense of smell to sniff out possible burned components.  Check all cabling, and press on the connectors to make sure they're firmly seated.  Aside from that, you can take out the memory cards and clean the lands, if you have the proper cleaner.

That's how it's done in the field by engineers.

HappyEngineerAuthor Commented:
This is all useful information, but for some reason the message stopped coming up during booting. I've been waiting to see if it reoccurs, but so far it hasn't. I obviously can't go in and change things in order to fix it if it isn't giving me any messages to indicate that it's still happening, so I guess I'll close the question.

I tried memtest86, but it didn't find any problems and I haven't seen the error during bootup since I ran it.
It may have been heat or any condition.  One bits are flakey, but it will return one day, count on it.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.