Link to home
Start Free TrialLog in
Avatar of dcrobinson1965
dcrobinson1965

asked on

VPN, delayed write, and Rasman hang

I have a Windows 2003 server with the following setup:
- a connection to a remote  server (Windows 2008 R2) using a PPTP VPN.
- I am then mounting a remote share, mapping to a drive letter.
- The remote share contains a large TrueCrypt volume file, which I am then mounting using TrueCrypt, mapping to another drive letter.
- A process (PHP) executes xcopy to copy files periodically to the mounted secure volume.

This works great, sometimes for several days, until at some seemingly "random" point I get an NTFS delayed write failure in the event log, followed by a TrueCrypt error.

Once this has "broken", the PHP process continues generating these events (never recovers). At some point, the failure becomes terminal and the VPN connection drops.

Sometimes, the VPN connection can be redialed. However, often the RASMAN service is completely locked up, and won't respond to stop requests. At this point, the only thing I can do (I think) is reboot the machine.

There are no errors or unusual events reported at the remote (2008) server end.

Any suggestions as to what's going on would be very helpful.
Avatar of digitap
digitap
Flag of United States of America image

Just so I can understand, you are mapping a driver via a VPN, then mounting a TrueCrypt driver to a TrueCrypt file over the VPN, right?

Do you have any scheduled restarts of either server?  I'm going to say that your VPN is going to drop occasionally.  It may recover itself seemingly keeping the VPN up, but that's going to adversely affect your mounted TrueCrypt drive.  If I were you, I don't believe I would expect it to stay online so consistently.  I would expect that I'd have to take things offline with some scheduled reboot, etc.
Avatar of dcrobinson1965
dcrobinson1965

ASKER

Some background... we're using the remote server as an "untrusted" disaster recovery solution. The server is a virtual server run by a 3rd party. We're copying our "confidential" data to it securely, because we have little control over the security of the virtual server, and don't want actively monitor it against hacking etc. We already have failover systems and so on, but this is a cheap way of protecting against earthquakes or plagues of locusts!

Yes, we're mounting a regular "CIFS / SMB" share over a VPN. The share contains a large (20GB) TrueCrypt volume (i.e. a big file), which we're mounting from the client end using TrueCrypt of course. This then appears on the client as another drive letter, to which we then mirror various folders and files as they change on our system.

Because the remote virtual server is not managed by us, it can occasionally reboot itself in order to apply Microsoft patches etc.  Our data copying program can detect write *some* failures and loss of VPN. At which point, it tries to close everything down, and then reestablish.

The insurmountable problem is that, on occasion, the VPN connection fails, and won't restart. That is, it refuses to dial, because RasMan appears to be completely locked up. The service doesn't respond to start/stop requests, and the only solution seems to be a server reboot. Note that the remote server, at this point is fine, and can be connected to from elsewhere over VPN.
Do you have a scheduled restart for your server?  If not, that's what I'd recommend.

Despite that, you have an interesting idea for disaster recovery solution.
We've just (yesterday) modified our software that does the copying, so it will periodically stop, unmount the TrueCrypt volume and the shared drive, and then disconnect the VPN. It then restarts everything. This is slightly inefficient, because on startup the mirroring software has to scan all the folders for changes, whereas once a synchronised "steady state" has been established it just uses the Microsoft APIs to be notified of changes.

It's too early to tell whether this makes for a robust solution. Either way, this is not a "solution" as such, but might make the problem go away (e.g. if it's some kind of memory leak or something). Failure of any part of the system is not a problem in itself, but RasMan locking up requires human intervention and a an annoying reboot.

We don't really want to reboot our servers, because the end that is acting as the VPN client is acting as a server in a similar arrangement with another machine on our LAN (same mirroring technique, but with no TrueCrypt or VPN involved). Thus, if we rebooted this server we'd have a knock on effect to a different server, if you follow me.

Not sure how "standard" our disaster recovery system is. Our client's data is sensitive, so we don't want to store it off site unencrypted. In a disaster we switch to the DR server and actively monitor its security. The big plus is that the virtual server is really cheap, but can have its RAM/disk/CPU count increased instantaneously through a control panel, so it's very cost effective.
Yes, cost effective is hard to get with this kind of solution.

So, referencing the first paragraph, you've made this change and now you've got the issue referenced in the original question?
No. Other way round. We've made the change because we had the problem.
ASKER CERTIFIED SOLUTION
Avatar of digitap
digitap
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Makes the problem go away, but isn't a proper solution
Thanks for the points!