Link to home
Start Free TrialLog in
Avatar of Bigdoggit
Bigdoggit

asked on

Computer Rebooting on it own, trying to troubleshoot possible causes

I work for a company that during normal business has to record 8 to 12 hour videos.  I recently took the position over in an effort to fix long standing issues and complete forgotten projects.  I say that because the computers were here before I arrived.  There seems to be an issue with computers randomly rebooting.  At first I believed this to be a setting, perhaps Automatic Updates with Windows. (all systems run Windows XP Pro and we have Small Business Server 2003)  However, I have been able to see the problem myself now and three different computers.  Two of them are EMachines, identical components inside, and the other is a Dell Optiplex 745 just purchased.  Now, I've become suspicious of the video capture card, but let me tell you what I know and what I've seen.  The reboots on the EMachines gave no error messages, just rebooted.  The Dell went to blue screen and told me the video card may have gotten caught in an infinite loop.  On the EMachines, I originally thought overheating, however upon sending the report to microsoft and reading the associated replies and suggestions, they suggested a device driver may have caused the issue.  The video capture card is ATI TV Wonder 200, and these computers do have ATI video cards, Radeon I believe, but I can check if someone thinks that info is vital.  Getting back to overheating though, I did remove the side panel, ran all the same software again for quite awhile and had no problem.  Near the end of the day I put the panel back on and the thing restarted within a minute.  I had been running everest pro and the CPU & Hard Drive temps did not show any climb.  There was an AUX temp, but I have no idea what it is measuring, if anything.  (I need to mention, this is an intermittent problem, I cannot not always recreate it, in fact, seldom can I)  So, now I'm wondering overheating.  However, with the DELL, this happened once, went to blue screen like I said earlier, then ran for 3 hours without any problem.  Since that is a new machine, and since I looked inside, I can be sure dirt or grime isn't an issue, or at least I feel confident I can, so the fans should be working.  Another thought that crossed my mind was the power supply not being strong enough.  We run video capture software, Active Webcam while other computers were using the Media Center included by ATI for recording software.  Restarts have happened on computers using both software packages.  We run a software program that is kind of like Pro Tools (the audio recording software) in that it records multiple channels, we record something like 15 I think, but the software measures medical stuff, Flow, ECG, brain activity, etc.  Also, we run PC Direct which is software to remotely control a piece of equipment in the room with the patient.  The power supplies have max. of 305 to 350 W.  Now usually I like 400 to 500 W, but maybe this is enough.  The processors are Core 2 Duo for the new Dells, I want to say 2 GHz.  They have 1 GB of ram.  The Emachines are running the first generation dual core processor by intel.  

The last thing I can think of to toss in the pot - I did check the event viewer, saved them actually for looking back at.  Right before a system error that is identical for both EMachines I could watch the problem happen on, I noticed the IMAPI CD-Burning Com Service entered its running state.  I cannot understand a link, just thought that was at least coincindental.  

So, now you have my novel - I hope this just wasn't a lot to read but helps make sense of the issue here.   will periodically check back to answer any questions anyone has.  Thank you in advance for you time and help.

BigDoggit
SOLUTION
Avatar of souseran
souseran
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Avatar of ShineOn
ShineOn
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Bigdoggit
Bigdoggit

ASKER

Hey guys, thank you for the responses.

Souseran, there are some questions I will have to answer when I get back to work.  I did notice a slight increase in the Aux temp - it got up to 144 F.  Again, not sure what this is measuring.  I do not remember the exact BSOD error - I was tired and starting to believe strongly I was going to replace video cards.  Maybe I'm speaking with frustration, but I think I have lost respect for ATI.  As fat as the systems go, all OS updates are done.  These are new computers, but I didn't check for Dell updates.  There are no updated drivers from the driver CD that shipped with the capture card.  The rest of the drivers seem to be up to date.  I will be doing a more methodical check.  I sometimes wish I could live at work and check every detail, but damn if I don't have enough time between all the day to day issues.  One thing of interest was dump logs.  I am a little inexperienced with dump logs.  This I will have to look at more closely.  I should mention that restarts seem to be happening, at least in my experience, with video being recorded or watched.  But, that may be coincidence.  I will give you more info, hopefully this week as I have at least one serious priority that must be dealt with by Thursday.  I am concerned about updating the BIOS.  Correct me if I am wrong, but that is somewhat risky, mostly if there were a power glitch which I can help to minimize with a UPS, but also cannot be undone.  I've updated BIOS before and had nothing but more problems later.

holthd - what do you mean by view it's own capture data on screen?  The software used does show the video on screen and record it to the hard drive as a proprietary file type until recording is finished, then converts it to file types of my choice, which is AVI right now.  Of course it is engineered and advertised that it is capable of doing this.  The PC was running the same software still when I put the panel back on and had it restart on me.  In other words, nothing changed, but the computer was working - running video capture software and running the medical software.

ShineOn - the card has a metal piece on it - I can't remember right now if that is a heatsink or just a metal protection piece or such.  If you google ATI TV Wonder 200 you could probably see a picture of the device.  But I'll look again tomorrow.  The capture card is as far away from the video card as physically possible.  

ATI suggested I go ahead and change RAM modules, change power supplies,  basically everything hardware one piece at a time.  With this problem being intermittent, at the very least that will take a very long time.  Hopefully we can narrow it down together.  Still, this may be a good time to ask for suggestions on quality video cards, and by quality I mean stable, that are less than 100 dollars.  I do not need great quality video, just enough to view a human body's movements.  No need to see facial expressions or anything.

Rather than the capture card itself (the metal piece is probably shielding around the tuner) I'd suspect the video card... they can run hot and if its heatsink/cooling fan aren't optimal it could flake out and cause a spontaneous reboot.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
BIOS upgrades can usually be rolled back. Check with the vendor to see what they fix before implementing them. You might want to investigate a cooling solution. 144 degrees is usually within specs for a mobo, but if it's running that hot all the time, that's not a good thing.
as anyone still watching this thread?  I have put some more time into the issue recently.  I have minidump logs if anyone is good at reading that stuff - incidently, if anyone can help me learn how to read it, you are a god.  Now I have tested the issue with the side cover off of my PC.  I also ran Everest Ultimate 2005, that had some sensor readings.  I could see the temp for the CPU and Hard Drive and Motherboard.  All were 55 to 58 degrees celcius at the moment of last random reboot.  I did configure the computer to show the Blue Screen.  It says an IRQL_NOT_LESS_THAN_OR_EQUAL error.  I was logged in as the domain administrator at the time.  I am seriously doubting overheating.  The video card does have a heatsink and fan on it.  I considered putting a pci card fan right underneath the Video Card, but with the side cover off and the temp readings I saw from Everest, I'm doubting the wisdom of such a move.  I am starting to lean more and more toward a driver issue - something is not stable.  Not my problem is - how the heck do I narrow it down.  I have random restarts on computers with NVidia cards and ATI cards.  I do not have a different Video Capture card with which to test.  Is there some log that XP keeps, something I can use to help point me in the right direction?  My ability to just swap out parts is somewhat limited.

Let me know what you guys think, and if I don't get a response, I suppose I will start a new thread.

To make sense of minidump logs, you want to use the dumpchk.exe file from the WinXP Pro install CD support\tools folder, or download the resource kit fresh.  Here's a how-to-use-it KB article: http://support.microsoft.com/kb/315271/EN-US/

IRQL_NOT_LESS_OR_EQUAL errors are usually driver problems, but not necessarily video driver.  It could also be the NIC driver, for example.  The 8-digit hex code that usually displays with a STOP / blue-screen error helps to determine what device is causing the problem.

The dump will help there as well.  The minidump should have that same hex code that the bluescreen would display.    The crash dump analysis should also give info on the calling module and stuff like that.

If you want to attach a snippet from the crash dump analysis results, fine, but the raw minidump wouldn't help us much if at all.
Anyway, the cause of an IRQL_NOT_LESS_OR_EQUAL is usually a driver attempting to use protected memory or memory that's in use by another driver, and hammering a chunk of something it shouldn't, IIRC, so fixing your drivers should fix the problem.  The trick is - which driver(s)?
Exactly the point.  Which driver.  I do believe it could be the Video Capture card driver - I did some more work.  I was able to learn how to use the debugging tools and another's help to get some info from the mini dump file.  I attached a WinDbg log.  It looks like the video capture card to everyone who sees this.  Now my current dilemma - trying to get drivers from ATI.  I have an ATI TV Wonder 200 (Pro) it looks like according to Device Manager.  I have tried searching for drivers, and I have the most up to date, so I'm about to try and use older drivers to see if I can find something stable.  Now my dilemma - how to I completely remove the ATI drivers so I can be sure I am installing on as clean a slate as possible.  Any thoughts?  Also, any thoughts on the log.  They are all about the same, except some point to catalyst.exe and some to webcam.exe - these are the video recording processes.  Okay, so, let me know what you think and see.  I'll try looking for an ATI driver removal tool, unless someone points one out to me first.
Microsoft (R) Windows Debugger Version 6.8.0004.0 X86
Copyright (c) Microsoft Corporation. All rights reserved.
 
 
Loading Dump File [C:\WINDOWS\Minidump\Mini111607-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available
 
Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows XP Kernel Version 2600 (Service Pack 2) MP (2 procs) Free x86 compatible
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 2600.xpsp_sp2_gdr.070227-2254
Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055c700
Debug session time: Fri Nov 16 12:41:24.459 2007 (GMT-5)
System Uptime: 2 days 6:55:58.500
Loading Kernel Symbols
.....................................................................................................................................
Loading User Symbols
Loading unloaded module list
..................................................
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************
 
Use !analyze -v to get detailed debugging information.
 
BugCheck 1000000A, {9d886240, 2, 0, 80522e06}
 
 
 
Probably caused by : memory_corruption ( nt!MiDeletePte+198 )
 
Followup: MachineOwner
---------
 
0: kd> !analyse -v
No export analyse found
0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************
 
IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 9d886240, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, bitfield :
	bit 0 : value 0 = read operation, 1 = write operation
	bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: 80522e06, address which referenced memory
 
Debugging Details:
------------------
 
 
 
 
READ_ADDRESS:  9d886240 

Open in new window

Can you enumerate what all devices are using what interrupt(IRQ)?  It could be that the TV Wonder is sharing an interrupt with too many or the wrong type of other devices.  Sometimes if you can't handle the problem with software, you can handle it by forcing a PCI slot to use a different IRQ in BIOS.  

Also, catalyst is an ATI thing, that's used for more than just video capture - is it possible that you've got a newer calalyst version than what the TV Wonder 200 supports, perhaps by virtue of having a newer ATI video card?
Interesting point on the video card maybe using newer drivers.  I have always been worried about conflicts between ATI PCI cards.  However, in this computer there is an NVidia card with my ATI capture card.  Now, with the IRQ's, I'm not sure how to go about enumerating them - have never messed with setting up IRQ's.  Are you fairly familiar with it, maybe have a second to give me a primer?  I'll do some research, though I may not have time for a couple days.  I will get back to you again soon, and I'll check to see if you have any quick suggestions regarding the IRQ enumeration.
You could use the built in Windows XP System Information tool, in your Start, Programs, Accessories, System Tools menu.

Look at Hardware Resources, IRQs.  Find out how many devices are sharing the same IRQ as the TV Wonder.

Some system BIOSes will let you re-assign shared IRQ's to help alleviate device conflicts.  Often, certain PCI slots will be able to use a certain set of IRQ's so they can only be changed in pairs or groups.  If your motherboard has hard limitations as to which IRQs get assigned to what slots, and your conflicting devices are both in one of those shared groups of slots, maybe you could try a different slot for one or the other.

Often, these IRQL_NOT_LESS_OR_EQUAL are caused by NIC or USB drivers, so don't rule them out, especially if they're sharing the same IRQ as the TV Wonder.
Well, I looked into the IRQ. I have included the listing.  The TV Wonder card shares the IRQ with one other device - "Microsoft UAA Bus Driver for High Definition Audio".  I know it is audio, but it sounds like it is something that could be part of the motherboard or an audio card.   The TV Wonder 200 has an audio output that must be plugged into my onboard audio for recording or playback I believe, not just one or the other, and therefore I assume this Bus Driver is integrated into the motherboard.  At any rate, I wonder if that can be the issue or not.  I was wondering, do the IRQ's have a reputation for needing to be in a certain order?  Are there ever issues that you know of where an IRQ can't call a higher IRQ?  I will try to do some research, but your last post saved me some looking, so I thought I'd throw this out to.  Let me know what you think about my IRQ setup and if you see something suspicious.  At this point, I don't know what to think about it.