Solved

"Single-Bit ECC Errors in Memory Bank"

Posted on 2006-07-02
10
3,185 Views
Last Modified: 2007-12-19
At bootup I see the message "Single-Bit ECC Errors in Memory Bank", but I am able to proceed with booting and nothing seems wrong.

I assume that at least one of the banks of ram is bad (I think I have either 4 or 8 banks in there).

Should I pull one out at a time until I find the broken one (and then replace it) or is it safe to just ignore it? If I can ignore it, is the speed of the ram impacted by this at all?

I don't know if my type of memory corrects single bit errors or simply detects them.
0
Comment
Question by:HappyEngineer
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
  • +4
10 Comments
 
LVL 32

Assisted Solution

by:jhance
jhance earned 100 total points
ID: 17026152
If the BIOS screen doesn't identify which RAM module is producing the error, then yes, the one-at-a-time removal method should help you identify the failing one.

ECC = Error CHECK and CORRECT.  This try DOES detect and correct single bit errors.  The other type, PARITY can only CHECK but since there is no correct capability, the system is halted.
0
 
LVL 14

Accepted Solution

by:
FriarTuk earned 100 total points
ID: 17026154
are these simms or dimms? does the mobo require matched ram in certain banks?

yes, you should try finding which one it is & remove it (one at a time)

but there are memory testers:
        http://www.memtest86.com/#download0
        http://www.simmtester.com/page/products/doc/docinfo.asp
        http://oca.microsoft.com/en/windiag.asp#top
0
 
LVL 6

Expert Comment

by:engineer_dell
ID: 17027091
ECC has the ability to correct a detected single-bit error in a 64-bit block of memory. When this happens, the computer will continue without a hiccup; it will have no idea that anything even happened. However, if you have a corrected error, it is useful to know this; a pattern of errors can indicate a hardware problem that needs to be addressed. Chipsets allowing ECC normally include a way to report corrected errors to the operating system, but it is up to the operating system to support this.

If a minor (one-bit) memory error occurs, the ECC logic will handle it. If a two-bit or larger error occurs in ECC memory, your system will be halted--similar to what happens with parity memory when any error is encountered.

refer this.
http://www.pcguide.com/ref/ram/errECC-c.html

Or you can download a third party memory testing software to find the fault
http://www.memtest.org/#downiso

Hope you find this helpful,
Engineer_Dell
0
Ready to trade in that old firewall?

Whether you need to trade-up to a shiny new Firebox or just ready to upgrade from whatever appliance you're using now, WatchGuard has the right appliance for you! Find your perfect Firebox today with appliance sizing tool!

 
LVL 44

Assisted Solution

by:scrathcyboy
scrathcyboy earned 100 total points
ID: 17028938
In general, it is NOT safe to ignore ANY RAM errror by windows.  Remove the chips and see if they have a CL posted on the RAM card.  If so, and it is CL3, it means your RAM is too slow for the chipset on the motherboard.  Most fast chipsets like NVidia need CL2.5 or faster.  Only the slowest chipsets can handle CL3, so if you have CL3 ram (check the net for your product number, if the CL is not printed on them), then you will get intermittent errors like you are getting.

This is NOT safe to proceed this way. You will get random seizures, failures, data loss, even corruption of the chipsets and eventual loss of the motherboard if you continue.  It is possible you have just one bad RAM chip, but most likely the RAM cannot perform to the latency requirements of the board.  It is not advisable to run with RAM too slow for the MB (slow has nothing to do with the PC xxx rating, BTW).
0
 
LVL 2

Assisted Solution

by:SaxicolousOne
SaxicolousOne earned 100 total points
ID: 17031798
Honestly, I wouldn't make blanket statements about the necessary CAS latency of memory modules without knowing what the system in question calls for. HappyEngineer, just check your system's (or specific motherboard's) recommendations as to the memory module specifications if this is something you're worried about. If you are concerned that your modules may be inappropriate for your system, you can have a look at exactly what memory you've got using CPU-Z...

http://www.cpuid.com/cpuz.php

... and compare that with the motherboard's stated requirements. If you can't find any stated requirements anywhere for your system, just go to http://www.kingston.com/ or some other memory manufacturer's site, look for THEIR recommended modules for your system, and have a look at those modules' specs.

If you want, you could also have a look in your BIOS and see what SPD timings (CAS, etc.) are supported by your system.
0
 
LVL 44

Expert Comment

by:scrathcyboy
ID: 17033020
The inportance of a good CAS match to the MB chipset is one of THE most important issues for stability.  Single bit ECC errors in this case are more likely related to the MB chipset not liking that ECC Ram, but the CAS issue is still supremely relevant.  With the wrong CAS, this is what you can get with ECC.
0
 
LVL 12

Assisted Solution

by:GinEric
GinEric earned 100 total points
ID: 17039892
Error Correction for single bit errors is between the the two ends, the memory and the input/output registers.  It's generated at each end and checked.  Therefore, it is not simply a memory problem.  One Bit Error Correction is more designed to fix a bit that got lost somewhere in the middle, not in memory.

It could just as well be a bad gate on the motherboard as a memory card, in fact, it would be more likely, since the Error Correction Code is stored in the parity bits in memory.

If you don't know what RAS and CAS timing are, you should not play with them.  The Refresh timers will cause all kinds of errors if you tweak them out of specification.

Your memory is not on all the time, that's how modern memory works, to cut down on power consumption.  The RAS and CAS timers determine how long memory is allowed to be powered off and still retain memory of bits until the next refresh cycle.  During the refresh cycle, you cannot access that memory, the request goes into a wait queue for a few milliseconds.  This is masked by how memory is transferred in large blocks, so you hardly ever see any latency.  To see how this works, look up dynamic refresh memory.

A One Bit Error is generally an indication that something outside of memory lost a bit, not the memory itself.  Anything from Flourescent lights to power surges in other devices can cause the loss of a bit here or there.  An air conditioner switching on could do it.  But if it's always the same bit, either some circuit is weak or some cable is loose or dirty, or perhaps some chip has a bad solder joint.  It could even be a badly laser welded pin leg on a chip and merely walking by it could cause enough vibration to to temporarily disconnect the pin.  I've seen all this in production and in the laboratory, under a scanning electron microscope.  Have even seen bad runs of all sorts of IC chips that had bad laser welds.

If the power supply is getting weak, or the motherboard is getting too hot, it can start to show up as one bit errors.

Often, replacing the memory only delays the eventual catastrophe by masking the real problem.  Then, when the real component fails, it takes the data with it.  So, even if it works after replacing the memory, there is no guarantee that you have actually found the problem, and, it may come back to haunt you.

You should check all of your voltages, connections, clear the dust from the motherboard and fans, hard drives, etc., and visually inspect while doing so, even use your sense of smell to sniff out possible burned components.  Check all cabling, and press on the connectors to make sure they're firmly seated.  Aside from that, you can take out the memory cards and clean the lands, if you have the proper cleaner.

That's how it's done in the field by engineers.


0
 
LVL 14

Expert Comment

by:FriarTuk
ID: 17054515
0
 

Author Comment

by:HappyEngineer
ID: 17055483
This is all useful information, but for some reason the message stopped coming up during booting. I've been waiting to see if it reoccurs, but so far it hasn't. I obviously can't go in and change things in order to fix it if it isn't giving me any messages to indicate that it's still happening, so I guess I'll close the question.

I tried memtest86, but it didn't find any problems and I haven't seen the error during bootup since I ran it.
0
 
LVL 12

Expert Comment

by:GinEric
ID: 17055581
It may have been heat or any condition.  One bits are flakey, but it will return one day, count on it.
0

Featured Post

The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Does your iMac really need a hardware upgrade? Will upgrading RAM speed-up your computer? If yes, then how can you proceed? Upgrading RAM in your iMac is not as simple as it may seem. This article will help you in getting and installing right RA…
I use more than 1 computer in my office for various reasons. Multiple keyboards and mice take up more than just extra space, they make working a little more complicated. Using one mouse and keyboard for all of my computers makes life easier. This co…
Monitoring a network: how to monitor network services and why? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the philosophy behind service monitoring and why a handshake validation is critical in network monitoring. Software utilized …
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question