I have recently completed the rebuilding of a computer. Here's a link to that problem conversation:
The machine generally performs well, and my customer is reasonably happy with it's overall performance, except when the machine hangs, then he's not. The symptom is the computer stops being responsive to some combination of the mouse and keyboard. After several minutes of maddening waiting, he arrives at a solution to the problem: turn the machine off. A problematid side effect is that, after the machine comes back up, the voice recognition files in the chief application he used Dragon Naturally Speaking, become corrupt and the application cannot perform speech to text translation without deleting thexe *.voc files, and going through the somewhat lengthy and cumbersome process of speaker dependent retraining. Here's a link to that problem conversation:
Here is the recipe for the new equipment that went into the computer:
Intel D915GAG mobo
Intel P4 3.0Ghz LGA775 processor
PQI memory, 2 sticks for a total of 512MB
Antec TruePower 430 power supply
MSI XA52P CD+DVD combo drive
Round IDE cables for the harddrive and floppy.
Windows XP Professional OEM w SP2 and all MSFT+INTC updates
Bios reflashed w Intel update
The Intel temperature and fan monitor is installed and operational.
After assembly, the machine was burned in for several days using StressTest and Seti@Home. Heat and sound issues were identified and corrected.
Customer equipment already in place:
Dell flat screen 19" monitor (nice!).
Logitech cordless mouse.
Logitech corded keyboard.
Note that this hanging symptom has happened once before with him, causing him to turn off the power to regain control, and resulting in the corruption of the Dragon speech recognition files. The solution was to delete the suspect *.voc files and retrain, but the root cause of the hanging was not determined.
The display presented on the monitor is normal. There is no BSOD or any strange artifacts currently being reported.
When I talked w my customer last night, he reported that the hanging symptom has reoccurred. Leading up to the problem, he reports that he was navigating through the start menu to launch a program (trend micro), launched it, then the machine became hung. The mouse would move, but there was no response from the keyboard. (this is different from the first time, where there was no response from either the mouse or the keyboard.) After waiting 2-5 minutes, he powered down and restarted.
This problem is intermittent. So far, over the past 2 weeks or so, this has occurred 2 times, with maddening results because there is a close correlation with this hanging and corrupting the speech recognition files, which puts him out of service.
Note that the original motivation for doing this rebuild was because the previous mobo had a failed keyboard controller, and there was no new replacement mobo available. Also, the machine failed to keep up with the speech to text translation, because the CPU was operating at 100% utilization just servicing errant keyboard interrupts. Here's a link to that problem conversation:
I do need to confirm my understanding of what lead up to the problem, and I will report more information after it is available, but I'm really interested in whether there is some hw or sw method of tracking the operational state of the machine leading up to the problem.
Some of the things that I will look at are:
The log of the temperature monitor, possible heat problem. May not be reliable because the log is stored in a file on disk.
The memory timing. The sticks say 2-3-3-6 but the mobo is reporting something slower 2.5-4-4-8. Change settings back to auto?
Check the event log on the mobo, to see if anything is reported there.
Swap out the current keyboard and rodent with ones I use that are known good. Possible low battery in mouse.
The environment. Wouldn't it be funny if there is something at the site, which affected the previous mobo by nuking the keyboard controller, is still there and affecting this new mobo a different way.
I thought NTFS was robust and didn't lose data when the power is gracelessly removed. This is acting like FAT. Am I being unrealistic?
Any thoughts on troubleshooting an intermittent error with serious consequences, which happens once every week or two?
Any suggs on making the filesystem more robust so that it is less likely to lose data?
Any suggs on a possible adjustment within Dragon Naturally Speaking to make it less susceptable. I'm suspecting that Dragon is leaving it's *.voc files in an exposed state, so that if the power were to gracelessly go away, the underlying files end up corrupt. I thought these files were only opened for rw during training, not during speech to text translation.
Any suggestions on how to study the problem, and/ or install monitors to record the operational state of the OS and the machine leading up to the hanging symptom or power reset?
I'll be heading over there in about 2 hours, but I will be frequently checking for responses. Thank you in advance for noodling on the problem with me!