Link to home
Start Free TrialLog in
Avatar of esabet
esabetFlag for United States of America

asked on

Chronic BSOD (Blue Screen of Death)

I have a custom made PC and here are the basic specs:

Mobo: EVGA nForce 680i SLI
CPU: Intel Core Duo 2 E6700
RAM: 2GB OCZ Model No. OCZ2N10662GK

The OS sits on a RAID 0 and the RAID controller is built-in the mobo.  I have attached the mobo spec for detail review if need be.

When I originally purchased this custom PC, all was well but after a while I began getting BSODs, but only intermittently.

Since the BSODs were intermittent I did not think of it much and assumed it is typical of windows operating systems. At that time Windows Vista Ultimate was installed on the PC.

Then the BSODs became more and more often to the point I felt the PC is no longer stable and my first guess was that the OS is corrupt but at that time I did not felt like reinstalling the OS.

A few days later the PC would not POST and all I would hear at boot up was ONE LONG BEEP that would repeat about every 5 seconds or so with NO POST!

After few reads I decided to check the RAM and after many trials I discovered if I removed the two RAM modules closest to the CPU the PC would POST and boot into windows successfully.  At that time I was not using the OCZ Rams, they were Mushkins.

I assumed the two RAM modules that I had to remove were faulty and continued to use the PC without them but the BSOD issue was not really resolved and was happening more often, to the point I stopped using the PC as it was very unstable.

Recently I decided to purchase four new RAM modules and reinstall the OS and bring the PC back to life.  I checked the mobo website and purchased the four OCZ RAMS as they were listed to be supported by the mobo.

Before the new RAMS arrived I installed the new OS and this time for trial I choose to install Windows Server 2008 as I was told it is one of the most stable OS on the market today.  This way I could assess things better.

At first all was well and the PC would run for hours without interruptions and I thought "FANTASTIC".  I could not wait till I receive the 4 new RAM modules to make this a nice machine.

As the new RAM modules arrived, I removed what was already in the PC and installed the 4 new RAM modules.  BUT - as I powered up the PC all I heard was ONE LONG BEEP and NO POST!!!!  As soon as I removed the two RAM modules closest to the CPU the PC would boot up no problem.

So I contacted the manufacturer of the mobo.  After a few tests they said they think the second channel of the RAM on the mobo is bad (that is the two slots closest to the CPU) and I have to RMA the mobo.

Meanwhile I continued to use the PC with only the two RAM modules installed and leaving the second channel of the RAM alone.  The PC ran with no interruptions till I started to get the BSODs again.  At first it was very random but quickly it became more and more often and close and closer together.  Last night it happened within 15 minutes of each other and I snapped a picture of both occurrences and have attached here.

Since I have not really been using the second channel of the RAM I don't understand the BSODs. Why is it at first I have no BSOD and then slowly they start appearing and become more and more often???  Any explanations?





122-CK-NF68.pdf
DEN-PC01.jpg
DEN-PC02.jpg
Avatar of frostburn
frostburn
Flag of United States of America image

Hi there,

What I would try is to reset the BIOS settings to default and then try booting it up.
It could be that the machine previously had the memory settings in the BIOS changed.

It should be auto-detecting the memory speed or you should manually try to enter the memory clock speed.
But try the default settings first. If that doesnt sort it out then have a look at setting the memory speed to match your RAM.

Hope this helps
-FB
Avatar of esabet

ASKER

I have checked the RAM settings before and will do it again and also try the default setting as you recommended but I just don't see how would that explain the BSODs?
Why is it that tthe BSODs are becoming more frequent as time goes by if the RAM/BIOS settings remained the same from day one of the new OS install?
 
Avatar of esabet

ASKER

I forgot to mention something at my initial post/question:
Typically when you get a BSOD, at the end you see the RAM being dumped onto the drive.  In my case that never takes place and it simply freezes and that explains how I could snap a picture.
Lastly at the last BSOD, after reboot, I got an error message that  winlogon.exe could not be located or is corrupt but after couple of reboots and a reboot into safe mode the OS started to load normally!
Well the BSOD would occur randomly depending on the process load being paged in and out of the memory.
I saw that the motherboard supports SLI ram.

What configuration did you install the ram in and what size is each chip?
Are they identical?

You can try and disable SLI Ready Memory in the BIOS to see if that changes anything.

-FB
Avatar of esabet

ASKER

Each RAM module is 1GB.  Here is the spec for the modules:
1066MHz DDR2; EPP 5-5-5-15 (CAS-TRCD-TRP-TRAS); Unbuffered;  2.1 Volts;240 Pin DIMM;  NVIDIA® SLI certified; EPP-Ready
I will try disabling SLI in the BIOS but in the past I did not use SLI RAMs and I was getting BSODs!
 
SOLUTION
Avatar of PCBONEZ
PCBONEZ
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of nobus
can you post the minidump ? rename it to ***.txt
Avatar of esabet

ASKER

nobus, minidump is not enabled, only full kernel.  Should I enable it.
frostburn: I set all settings to default and double checked the memory settings (timing, voltage, etc.).  I still kept only two RAM modules and on the first try the system had a problem: I had an error saying winlogon can't be found or is corrupt. On reboot my RAID showed and error! On the next reboot I checked the BIOS and strange enough the boot priority was no longer on the RAID.  I fixed and rebooted the system and this time the RAID showed to be "HEALTHY" and windows option to load into safe mode showed up. i choose "Safe Mode" and all go.  Then i did a chkdsk (w/o Bad Sector test) and no errors on Reboot and windows booted normally!!  There is something very strange going on and I think IMO the board has multiple issues but it still does not explain why the BSODs gradulyget more and more often!!  That to me does not make sense!!
PCBONEZ: What you say maybe very well true but why is it if I populate the two RAM slots closest to the CPU and leave the other two slots blank, the system behaves the same way and does not boot up!! I think its a combination of many problems with the board and a RMA is the way to go.  Also, I still don't understand in either scenario, why for while windows runs smoothly without BSOD but after a while the BSODs start to appear and get worst and worst!!
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Strike "I'm". - Enter "It's". [typo]
Avatar of esabet

ASKER

Wow, fantastic reply!!  Very in depth and infotmative.  Thanks a million!  I think now I understand the problem clearly!  Thank you PCBONEZ!
I guess I was trying too hard not to take the PC apart!  Though I enjoy putting PCs together, pulling them aprt is a heart breaker! LOL
I guess RMA here I come! I will follow up this post once I have received my replacement board.
As a bonus a friend of mine told me when someone else he knew RMAd his 680i EVGA sent him a 750 back, so you never know, its hard to pass on the possibility of a free upgrade!
>>  minidump is not enabled, only full kernel.  Should I enable it.   <<  yes of course
Avatar of esabet

ASKER

nobus: After reading your reply I noticed how dumb I may have sounded, of course, I have already enabled it and leaved the restart option off.  I guess I really meant that I will do as I asked as oppose to whether I should do it or not. LOL  You know how it is, sometimes you think one thing but you really mean something else!
well - no problem, just post it here for analysis
Avatar of esabet

ASKER


Hi;
I will for sure post the minidump as soon as I have it but interesting enough as of 2 PM of March 14th (Saturday) no BSODs have appeared!!!
All I have done was, per frostburn recommendations I reset the BIOS to default and then I went through the BIOS line by line. I made sure all settings were correct which, incidentally, was identical to the settings prior to the reset to default.  BUT I came across one setting in BIOS which was called Limit CPUID MaxVal which I was not sure what it is for.  The online help does not give much explanation except it recommends to be disabled for Windows XP. So, even though I am not using Windows XP, I choose to disable it.  This is the only change I made to the BIOS.
I am not sure if it is coincidental or what but any idea what that setting is about? Also I have not tested installing all 4 RAM modules since I made that change to the BIOS but I am assuming that is still an issue!
Any idea what the Limit CPUID MaxVal in the BIOS for and if that could have actually been a reason for the BSODs?
I have attached a copy of the mobo manual, the BIOS setting is on page 101.

680i-manual.pdf
The simple version is that if enabled it turns off Hyper Threading.
{Limits the CPUID - CPU Identification, in this case to the Operating System.}

It should only be enabled if you are using an OS that does not support Hyper Threading with a CPU that does.

I could see it causing BSOD if you have turned off Hyper Threading in the CPU but have software set up to use Hyper Threading.
-
I'm not sure if it would cause YOUR BSOD's however. - Maybe so.
interesting; i never had to use such a bios setting
I've never had to use it either.
I think it's intended for Win98 and ME and I've never used those on a P4 w/HT either.
Win2K up has SMP and so support HT.
Avatar of esabet

ASKER


It seems I spoke too soon.
Last night while my daughter was using the PC the system froze (no BSOD, simply froze). So I did a forced reboot (held the power button) and it booted up, no problem, but then it froze again (no BSOD) in about 15 minutes and I forced the PC off.
About 30 minutes later I tried power up the PC and I got RAID error message and no boot! I changed the BIOS settings to disable RAID and on reboot I saw strange characters during POST while the HDDs were being detected. Changed the BIOS to enable RAID and the system was just simply acting up: it would either freeze during POST or it would show a RAID error message.
Desperate to save my data I did some search on EVGA site and came across a way to reset the mobo. Followed the instructions which consisted of turning off the PSU, Reset CMOS using the jumper, removing CMOS battery backup, discharging the mobo, removing all RAM modules, waiting 20 minutes and then reinitiate boot up by making CMOS changes to default and etc.. Now the RAID was reported as HEALTHY and I was back in business!
In the past I had heard good thing about EVGA mobos but this is ridiculous!  Either I have gotten a lemon or this is typical of this brand.  I hope not typical!
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of esabet

ASKER

I know for ceratin there is at least one issue with the mobo, the second channel of the RAM is shut.  I think then I will strat wit hte RMA and after that if problmes come back then I will have to look into the PSU.
Thank you all and I will keep you posted!
While you are waiting for the RMA why don't you take a peak inside that PSU.
If it's BAD you don't want to connect it to yet another motherboard do you?

Look for swollen or bloated capacitors and anything that looks like it's getting toasty. [Burning up.]
Use a flashlight, a wooden stick or insulated screw driver to move wires, and DON'T have it plugged in.
- When it is plugged in the heatsinks will be energized.
.
Avatar of esabet

ASKER

I took the mobo out last night and will be shippin it out today.  I will take your advice and do a visual examination of the PSU.
Thanks for the advice.
Avatar of esabet

ASKER


Hi;

Just a small update: I just received a notification that  a replacement was shipped.  It turns out they are shipping me not an identical board, instead they are shipping me a nForce 780i SLI mobo.

Does this mean I will have to reinstall my OS when i receive the new mobo? Please keep in mind that my OS is sitting on a RAID 0 and I was using the RAID controller built-into the old mobo!

Any other recommendations to get myself ready will be greatly appreciated.
 
normally - no, but you must install the drivers.
best remove the old software and drivers first
check also in device manager>view>click show hidden devices
Avatar of esabet

ASKER

Well, when I installed the OS (Win Server 2008) with the old mobo, I did not install any additional drivers or softwares for the mobo. I allowed the OS do everything. So I assume ONLY default drivers were installed.
Would you say it is better if I reinstalled the OS? I already backed up the drive before the old mobo was shipped so in case I need any files/docs I can always retreive them.
 
Nothing says that there are even default drivers for it.; just check if you have any on the cd.
it can be your old card wasn't even installed properly, and you just worked in vga mode... whaty, of course you can do with any card
Avatar of esabet

ASKER

nobus,
We are talking about the motherboard and not the graphic Card, right? So what does "vga mode" has anything to do with it? Please enlighten me, thanks!
why would i been talking about the mobo? your question was on the video card, and vga mode relates to it
Avatar of esabet

ASKER

nobus;
I think there is miss-communication somewhere, the entire post/question relates to my PC's motherboard ('mobo' for short)!
Not even once I have discussed my Video Card! :)
yes- you're right, sorry i was misled. blame it on alzheimer effects...
i Take it from here again : >>  I did not install any additional drivers or softwares for the mobo
then you are indeed running on defaults; i would check in device manager for errors, and try to install the manufacturers drivers though

Avatar of esabet

ASKER

I understand! No problem and I will try your recommendation.
It's advisable to install the chipset drivers even if they don't appear to be needed.
Lacking that some Chipset features may be 'turned off' or using generic drivers that don't offer full performance.
[USB2.0 working at USB1.x speeds is a good example that can happen with some boards.]
- That shouldn't require an OS reinstall as you'd have to already be in the OS to install those drivers.

I believe "Alzheimer Effects" is the wrong term.
Perhaps "Geriatric Moment"? - I prefer "Brain Fart" myself.
Avatar of esabet

ASKER

Thank you everyone. The mobo arrived last night and I will be working on it during the weekend and hopefully things will go smoothly!  Thanks for the advance advices!
Avatar of esabet

ASKER

Hello all;
Over the weekend I installed the new 780i mobo. But when it came to intsalling the OS (Win Server 2008) I started having issues.
When I strated Windows setup I got the following error message:  Error: access violation (0xC0000005), Address. I tried couple of more times, double checking my BIOS settings but the same error would occur.
Something told me that it may have to do with the RAM.  I had 4 GB of RAM installed and I decided to remove all except for one.
And what do you know, with only 1GB of RAM no more errors and Windows installed fine.
Then I proceeded with installing the motherboard's Drivers provided by the manufacturer and the Video Card driver. I restarted the machine and still everything is running like clock work.
So I decided to add one more GB of RAM. (So now I would have 2 GB of RAM installed). BUT that caused issues: Before windows loads I get an error message that a Hardware or Software has changed and after selecting continue it asks if I want to start windows Normally and when I choose yes, sometimes windows starts and sometimes it loops back to the same error message.

But even after Windows loads, I get yet more error, something that would not show up when there was only 1 GB installed.
I also tried installing 4GB of RAM and the problem becomes worsened!!!!
Is there an explanation fo rthis?
4 sticks ? Then  read item 10    http://www.corsairmemory.com/memory_basics/index.html      
Avatar of esabet

ASKER

nobus;
I am assuming from you post that you are trying to say my RAM modules may be Unregistered.
But is that accountable for the error when there is only 2GB installed?
Also, the RAM I am using is amongst the RAM modules that the motherboard manufacturer has placed on their compatibility list!  Doesn't that count for something?
No -not with 2 sticks
and i'm not saying anything - that's what you do.
i just attract your attention to the fact that with 4 sticks, all mobo's run less stable; some work, some don't - and evrything between that....
normally - when they put 4 slots on a mobo, you would expect them to run fine with 4...but if you read item 10 from Corsair - it seems that is just not true...
Avatar of esabet

ASKER

I understand.
I guess my next step is to test the individual RAMs and see if they are faulty!
Good idea.
I'm assuming you know about memtest.

Also double check your manual for the correct slots to use for two sticks.
Some go X _ X _ and others X X _ _
Avatar of esabet

ASKER

I know about the slot configuration and i am using the manufacturer's recommendations.
I am also aware of Memtest but had a question:
there appears to be two different versions of Memtest: There is the good old Memtest86 and Memtest86+.  What is the difference and which one should I use?
Memtest is open source software that legal for anyone to modify.
It's the same program there are just two groups working on it.
Since both projects are open source it's legal for them to steal code from each other and they do which basically keeps the program the same which ever release (from either one) is newest is the most up to date version of the same thing.
- Silly yes?
It turns out to be a good thing for us because the competition between them encouages them to keep it up to date.
I've been using memtest86 v3.4a for a while now with no problems all the way up to (and including) DDR3.
I see version 3.5 is out now and it's newer than memtest86+'s version so that's what I'd use.
.
Avatar of esabet

ASKER

Got it!!! Thanks a million.
Avatar of esabet

ASKER

Hi,
I have decided to close this question and start a new question as the problem seems to be really not the same as this one.
Thank you all and here is the link to my new question if you wish to help me along (which I would greatly appreciate, of course!):
New Question