Serious Windows XP network problem - delayed write errors and missing files and folders
Posted on 2007-03-27
This is a very serious problem and I really appreciate any help.
Here is the scenario:
I have been working on a network consisting of 6 computers running Windows XP Pro. One of these machines acts as a file server and a database server for ACT 2005. The server sits by itself in a closet and nobody touches it except for me. The server is just a basic Windows box and is fast, but not fancy or unusual in any way. It has a RAID 1 configuration on two 80 GB SATA drives with only a single partition. Everything is on this partition.
The client computers are all just HP desktop computers connecting to the server through a mapped drive linked through the server's static IP (ie, the mapped drive connects to \\192.168.1.10\fileshare). All computers are using TCP/IP as the only networking protocol and all have static IPs.
The networking gear includes a new Linksys 16-port 10/100 switch, a Cisco 800 series router, and a Cisco 800 series DSL modem. There is also a single Linksys wireless access point (WAP54g) connected to the switch. The access point is using WPA with a very long password.
The client computers are all running ACT 2005 and access the server for the primary database. As I said, they use mapped drives to the server (1 mapped drive per client PC) and they use Paperport 11 to view the shared files. I recognize the issues regarding using Paperport in a network environment, but have had an impossible time convincing the owner of the business to stop using it.
I flew from NY to Montana to completely redo this network in January and here is what I did and why. The server was always the server for the ACT database, but the files accessed by Paperport were actually stored on and shared from one of the user's PCs. I moved all of those files to the server and reset the permissions and ownership on all files and subfolders. I reconfigured the network so that computers used static IP addresses instead of DHCP, and I configured mapped drives to connect using the server IP instead of the netBIOS name. I disabled automatic discovery of network shares and printers on all computers. I removed IPX protocols from 2 of the clients. I also did extensive spyware and virus scans on all computers and installed Avast antivirus on all machines, including the server. I did a lot to improve the speed of all computers through disabling system restore, error reporting, indexing, etc. I spent 4 days from 8am to midnight completely redoing this network. I also changed the connection speed setting in the driver for the server's network card. It was set to 100 Mbps half-duplex for some crazy reason. I set it to full-duplex obviously. I also formatted two of the computers and did very careful full reinstalls of Windows and all software.
When I left, everything was a million times better than before. Network access was much faster across the board. Access to the ACT database was significantly improved, and people were not having Paperport crash all the time. Then, about five weeks after I left, one user (using one of the computers I formatted and reinstalled) was having periodic freezes in both ACT and Paperport, but the freeze-ups were not effecting anyone else at all. I suspected that she was just being impatient and clicking on things like crazy when she experienced even slight network lag (she seems like that type of user). Then, a few days after that, she got a delayed write error while saving a shared Excel spreadsheet that was on the server. Then, several days later two other users got a delayed-write error while saving the same document. I ran a chkdsk /r on the server and nobody got any more errors for about two weeks. I also completely remade the Excel file that caused the original error because I thought it was corrupt. More than two weeks went by without any freeze-ups or delayed-write errors, but today someone called and said that they got a delayed-write error on a Word doc they were saving and then the strangest thing happened:
The new Excel spreadsheet that I remade after the first delayed-write errors mysteriously disappeared. As I was asking all the users if they might have deleted it or moved it on accident (everyone swears they didn't and I couldn't find the file even after searching all computers), someone else noticed that an entire directory disappeared (in a folder separate from the word file). I ran 'chkdsk' on the server and it gave me the following errors:
CHKDSK discovered free space marked as allocated in the master file table bitmap
CHKDSK discovered free space marked as allocated in the volume bitmap
I got everyone out of the server and ran a chkdsk /r and rebooted the server. The logfile read as follows:
A disk check has been scheduled.
Windows will now check the disk.
Cleaning up minor inconsistencies on the drive.
Cleaning up 6 unused index entries from index $SII of file 0x9.
Cleaning up 6 unused index entries from index $SDH of file 0x9.
Cleaning up 6 unused security descriptors.
CHKDSK is verifying file data (stage 4 of 5)...
File data verification completed.
CHKDSK is verifying free space (stage 5 of 5)...
Free space verification is complete.
78140128 KB total disk space.
15393476 KB in 58857 files.
24300 KB in 9884 indexes.
0 KB in bad sectors.
161900 KB in use by the system.
65536 KB occupied by the log file.
62560452 KB available on disk.
4096 bytes in each allocation unit.
19535032 total allocation units on disk.
15640113 allocation units available on disk.
However, after I ran the chkdsk, the excel file and the other folder are still gone. Because I am obsessive about backing everything up each night (and online) we were able to just grab the missing files from the backup, but it is not good that this is happening and I know it points to a more serious problem. I considered that one of the RAID drives might be failing, I disabled the wireless network in case someone got on and was deleting files, and I told everyone they would loose a hand if they so much as touched Paperport. For now I am allowing everone access to the ACT database and the shared files, but I really need to figure out what's up. Thanks for taking the time to read all this and thanks very much for your help.