Link to home
Start Free TrialLog in
Avatar of ChrisEddy
ChrisEddyFlag for United States of America

asked on

Req suggs to tshoot hanging by XP Pro on new computer rebuild

Gentlemen,

I have recently completed the rebuilding of a computer.  Here's a link to that problem conversation:
https://www.experts-exchange.com/questions/21170717/Mobo-suggs-uATX-DDR2-IDE133-PCI-X.html

The machine generally performs well, and my customer is reasonably happy with it's overall performance, except when the machine hangs, then he's not.  The symptom is the computer stops being responsive to some combination of the mouse and keyboard.  After several minutes of maddening waiting, he arrives at a solution to the problem: turn the machine off.  A problematid side effect is that, after the machine comes back up, the voice recognition files in the chief application he used Dragon Naturally Speaking, become corrupt and the application cannot perform speech to text translation without deleting thexe *.voc files, and going through the somewhat lengthy and cumbersome process of speaker dependent retraining.  Here's a link to that problem conversation:
https://www.experts-exchange.com/questions/21193966/Dragon-NaturallySpeaking-An-internal-recognizer-error-has-occurred.html

Here is the recipe for the new equipment that went into the computer:

  Intel D915GAG mobo
  Intel P4 3.0Ghz LGA775 processor
  PQI memory, 2 sticks for a total of 512MB
  Antec TruePower 430 power supply
  MSI XA52P CD+DVD combo drive
  Round IDE cables for the harddrive and floppy.
  Windows XP Professional OEM w SP2 and all MSFT+INTC updates
  Bios reflashed w Intel update
  The Intel temperature and fan monitor is installed and operational.

After assembly, the machine was burned in for several days using StressTest and Seti@Home.  Heat and sound issues were identified and corrected.  

Customer equipment already in place:

  Dell flat screen 19" monitor (nice!).
   Logitech cordless mouse.
   Logitech corded keyboard.

Note that this hanging symptom has happened once before with him, causing him to turn off the power to regain control, and resulting in the corruption of the Dragon speech recognition files.  The solution was to delete the suspect *.voc files and retrain, but the root cause of the hanging was not determined.

The display presented on the monitor is normal.  There is no BSOD or any strange artifacts currently being reported.  

When I talked w my customer last night, he reported that the hanging symptom has reoccurred.  Leading up to the problem, he reports that he was navigating through the start menu to launch a program (trend micro), launched it, then the machine became hung.  The mouse would move, but there was no response from the keyboard.  (this is different from the first time, where there was no response from either the mouse or the keyboard.)  After waiting 2-5 minutes, he powered down and restarted.  

This problem is intermittent.  So far, over the past 2 weeks or so, this has occurred 2 times, with maddening results because there is a close correlation with this hanging and corrupting the speech recognition files, which puts him out of service.

Note that the original motivation for doing this rebuild was because the previous mobo had a failed keyboard controller, and there was no new replacement mobo available.  Also, the machine failed to keep up with the speech to text translation, because the CPU was operating at 100% utilization just servicing errant keyboard interrupts.  Here's a link to that problem conversation:
https://www.experts-exchange.com/questions/21162188/Fresh-XP-Pro-SP2-install-System-100-cpu-DPCs-are-huge.html

I do need to confirm my understanding of what lead up to the problem, and I will report more information after it is available, but I'm really interested in whether there is some hw or sw method of tracking the operational state of the machine leading up to the problem.

Some of the things that I will look at are:
  The log of the temperature monitor, possible heat problem.  May not be reliable because the log is stored in a file on disk.
  The memory timing.  The sticks say 2-3-3-6 but the mobo is reporting something slower 2.5-4-4-8.  Change settings back to auto?
  Check the event log on the mobo, to see if anything is reported there.
  Swap out the current keyboard and rodent with ones I use that are known good.  Possible low battery in mouse.
  The environment.  Wouldn't it be funny if there is something at the site, which affected the previous mobo by nuking the keyboard controller, is still there and affecting this new mobo a different way.

Some questions:
  I thought NTFS was robust and didn't lose data when the power is gracelessly removed.  This is acting like FAT.  Am I being unrealistic?
  Any thoughts on troubleshooting an intermittent error with serious consequences, which happens once every week or two?
  Any suggs on making the filesystem more robust so that it is less likely to lose data?
  Any suggs on a possible adjustment within Dragon Naturally Speaking to make it less susceptable.  I'm suspecting that Dragon is leaving it's *.voc files in an exposed state, so that if the power were to gracelessly go away, the underlying files end up corrupt.  I thought these files were only opened for rw during training, not during speech to text translation.
  Any suggestions on how to study the problem, and/ or install monitors to record the operational state of the OS and the machine leading up to the hanging symptom or power reset?

I'll be heading over there in about 2 hours, but I will be frequently checking for responses.   Thank you in advance for noodling on the problem with me!


SOLUTION
Avatar of stevenlewis
stevenlewis

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of ChrisEddy

ASKER

Ack on the event log!

I have a keyboard and mouse with a PS2 style connector to use.  

Never heard of AntiCrash.  I will research this.  Thank you for the tip!
Btw: This is a fresh installation of XPpro, eg: < 2 weeks old.  Plus I've configured things so that user things can't have systemic effects.


Avatar of rindi
Generaly NTFS is robust. If the disk doesn't work properly though, that won't help much. You might just want to keep an eye on the disk. Does it get hot? I think I saw you mentioned a Maxtor drive and I have seen a couple of maxtors getting really hot and dieing. Consider a fan to cool the disk. Also make sure the Powersupply connector powering your disk makes good contact.

How does Naturally speaking save its data? Does it keep the data in memory and periodically write it to disk, or does it write continuosly? Does it make one large file or many small ones? Can you adjust that? If yes, try to make the file size smaller. What does a minute of speech convert to in diskspace? Is the Data uncompressed or is it some form similar to mp3? Can you adjust the bandwidth of how the data is to be recorded (ie. 100Hz-10000Hz).

I just suggest that if possible you should try to keep those files as small as possible and get the system to save them as often as possible, best continuosly. If the data hasn't been written to disk and the system crashes, that data wouldn't have been written to disk yet. And what hasn't been written to disk can't be saved even by ntfs, because that data just never was there!
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Gentlemen,

Thank you all for responding so quickly!

When I arrived today, my customer told me he had replaced the batteries in both the mouse and keyboard.  I had brought a wired keyboard and wired mouse to use, but this offer was declined.  But I did check the event an temperature alarm logs, and there was nothing interesting happening at that time.  One down.

The error reported by Dragon was different this time, and did not relate to the *.voc files, but rather reported a progam initialization error.  Renamed the folder with the *.voc files, relaunched Dragon, and the problem persisted.  Went through the repair procedure, and the problem persisted.  Uninstalled/ rebooted/ installed/ ran Dragon, and the problem still persisted.  Uninstalled, searched and removed entries from the registry and the filesystem, reinstalled, and this time I got a permissions error on 4 different directories.  Relaxed the permissions on the files and folders reported by Dragon, then reran Dragon as a normal user, and it failed silently.  Dragon just started, showed the splash screen for 2 seconds, then exited.  Switched over to an administrative account, ran Dragon, and it launched fine.  Hmmm, this was diagnostic in a positive way.  Relaxed permissions on more Dragon folders and files, switched over to a mere user account, launched Dragon and it came up fine and asked for training.  So this ended up being a combination problem.  Two down.

Some new news: The symptom of hanging while navigating through the start menu actually happened while I was using the machine.  Talk about good fortune!  Specifically, the item I selected was rendered as mostly transparent, and stayed that way indefinitely.  The mouse moved fine, and I brought up the task manager to see what process was running, so the keyboard worked too.  I think the process was called sysfade.exe, or something relating to fading.  After I killed that process, the ghosted menu item went away, and normal appearance returned.  After this, I disabled the animation of fade-in and fade-out of menu items, so this should not happen again.  Three down.

Also while I was there, there were several complaints of processor temperature exceeding the warning threshold, plus the sound of a fan spinning faster and harder when the computer was being used.  I opened the case, and used my finger to gently rock the 4 posts of the processor cooler, and found two of them with more play than I would call snug.  Reseated the cooler on the processor, the posts are now snug, and the processor temperature seems to be running 10-20 degrees cooler.  Four down.

On specifying Ram speed in the Bios, you're right, but I forgot to do that today, and will the next time I'm there.  I'm less interested in having the fastest machine on the block, and much more interested in having reliability and robustness to the point of being unable to kill it with a gun.  

On the architecture of Dragon, I didn't check whether the file timestamps were altered as a function of recognizing speech, but I can check this next time I'm there and report findings.  Also, I don't know what Dragon's blocking factor is when doing I/O with the *.voc files, or setting the frequency of the application doing data backups.  The documentation is sparse enough to possibly be a useful but scratchy substitute for toilet paper - if it was printed.  


ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
tapkep, overheating alarms usually depend on how the threshold has been set. This can be a userdefined setting and if the user sets them too low, they will popup whenever those values are reached. They should be adjusted according to the specs. of the CPU, whereby it also depends on where and how the temp is measured. Obviously if the sensor isn't right on the processors dye, you won't get an accurate reading. You will have to know what the actual dye temp is if you measure a cerrtain temp at a certain place. Then you may be able to calculate the actual temp and set the threshold value accordingly. In short, the temperature readouts you get will always be innaccurate except if the temperature sensor is built into the processor or if mainboard manufacturer has done an extensive (expensive) calibration, which I don't believe is being done.
Gentlemen,

Answers to your questions:

OS was XP Home, is now XP Pro. And ll updates have been applied.

Version of Dragon = 7, the latest.  And yes, it was installed from an administrator account.  Btw: The perms on all of the top level folders and the log files needed to be relaxed, to allow modification+writing by Users.  Otherwise the program fails, silently and otherwise, during startup.

Btw: I was interested that the "hanging" symptom was indeed just that, a symptom produced by a visual artifact which (unfortunately) motivated my customer to gracelessly power down the computer, which started this mess.  I suspect that the disabling of menu animation and fading is a proper solution for the root cause - a bug in sysfade.exe

On the temperature alarms, I did boost the temperature threshold about 5F during burn in, to reduce the frequency of popups complaining about a transient temperature spike, and believe I have them in an appropriate range based on INTC thermal data plus a margin of safety.  These temperature complaints were far more frequent than when I was driving the machine hard for days at a time during burn in.  Since airflow through the chassis has remained a constant, relatively speaking since the fans are variable speed, this suggested that the processor cooler wasn't doing it's thing right or enough. The non-snug posts which hold the cooler against the processor also suggestd this.  The fairly significant temperature drop after reseating the cooler confirmed this.

On switching from cordless to corded touch points, I offered, but he doesn't want to after he changed the batteries and believes this was the problem.  

I like your thinking about changing the permissions on the *.voc files to read only by users.  Providing that the program allows this, it would eliminate a negative opportunity.  I will try to remember this the next time I am over there, and I will try this and report results.


SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of stevenlewis
stevenlewis

Thanks, and glad we could help!