Remote Services (RDP, VNC) break randomly, while server is still pingable...

hello all,

I've been dealing with this problem for a few days now, and I'm sort of stomped, here's the situation.
I recently installed a file server at one of my clients. The setup is pretty simple:
DSL Modem  --> LinkSys Router/AP --> Dell 24 port unmanaged switch
One DELL PowerEdge Server connected to straight to the linksys router Running Windows Standard Server 2003 with DHCP , WINS and Internal DNS
All clients (6 clients) connected through the Dell Switch.

The problem I've been having is this:
I've opened up a port in the router for Remote Desktop (in administration mode), so that I can manage the box remotely. what's happening is that randomly, I'd be in an RDP session, and *wham* I lose connection.
I tried installing VNC and opening the appropriate port, to see if I can isolate the problem to RDP, but the access has been blocked completely to server services.

I still haven't tried to:
Try to have a service other than remote access open  (i.e. IIS) to the internet to see if this service gets blocked as well.  

Here's the troubleshooting I've done so far:
I've verified that when I lose remote access to the server, that I am still able to access the gateway from WAN. (the LinkSys Router) - CHECK
I've verified that when I lose remote access to the server, that the server is still live ... I was able to ping it from the router  - CHECK
I've verified that I'm not able to access the server via RDP and/or vnc from several remote workstations, to make sure it's not a problem with my client - CHECK
I've verified that Windows Firewall is turned off on the Server - CHECK
I used to have the Hamachi VPN software on there, which I initially thought might be causing the problem, so I uninstalled it - CHECK - Problem persisted.
When the problem happens, I'm usually not on site, so I've asked my client to try to connect to the server via RDP from within the LAN, and it DID work. so connection via WAN only is not working.  - CHECK

Current Workaround: reboot the server, and everything is back to normal.

So far I've experienced the problem happening while I'm in a remote session, RDP or VNC.

My guess of the problem right now is evolving on a problem, primarily, with the server itself, or perhaps, the router (??doubtful though??)

Thank you in advance for any help you can provide.
LVL 10
George KhairallahCTOAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Bradley FoxLAN/WAN Systems AdministratorCommented:
I know this isn't a good answer but it will help with your troubleshooting.  I connect from work to my computer at home.  I have exteranl port 9789 forwarded to internal 3389 on my PC with a Linksys router.  I experience dropped connections all the time then cannot reconnect for about an hour at a time.  I have flashed the firmware on my router to no avail.  My roomate also experiences the same problem.  We are both running new PCs with XP Pro.  This leads me to belive the issue is with the Linksys.  

I would suggest taking out the Linksys and replacing it with a SonicWall and using VPN to remotly manage.  SonicWall makes a great device which I use at work as my firewall and VPN solution.  I never get dropped connections when remoting from home to work with VPN.
George KhairallahCTOAuthor Commented:
Thanks for the reply.
This would certainly be an option that would explain the problem. although, I have to say, I have used LinkSys routers before, and I've never had an issue similar to this one before.

However, your thought does make sense in a way, since all port forwarding is stopping completely on the router. I guess I can see if there is any new firmware for the router that I can install.

I'm also trying to be there sometime when the server breaks, to try and check the routing table on the server, and see if it's broken somehow. although, if it is, then even local clients wouldn't be able to RDP into the server.

I will wait a bit on giving you point on that, as I would like to try and test that theory. Meanwhile, if anyone else has any other theories on the problem, please let me know.
Thanks!!
tray_jonesCommented:
Are the forwarded ports set as Application triggered or persistant?
Become a Microsoft Certified Solutions Expert

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

George KhairallahCTOAuthor Commented:
the ports are all persistant ports. there's nothing fancy in the setup.
tray_jonesCommented:
what is the router model...wrt54g?
tray_jonesCommented:
Do big single file transfer die after the same amount of time?? Anything unusual in the event logs?

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
George KhairallahCTOAuthor Commented:
The router model is a WRT54GD. I'm not sure of the firmware version number though. I haven't been on site yet to check on that though. I'm planning to upgrade it to the latest firmware though. I also just spoke with LinkSys, and they suggested that I change the MTU. I took it down from 1500 to 1492, but they also gave me a link to find the correct MTU.

As for big file transfers, I'm not really sure. I haven't really done that, but it seems like usually when things happens is when I generate lots of traffic through the gateway (RDP Sessions with lots of screen activity, etc ... ). The event logs are spankin' clean. the Incoming log on the router shows my RDP sessions, but no errors whatsoever.

Anyway, I just wanted to divert your attention a little bit AWAY from the Linksys, because if you look at the workaround in the initial post, i can fix the problem by rebooting the SERVER (not the router). very odd, I know.

so it seems like all the symptoms are pointing to the router, but yet, I can fix it by rebooting the server??
George KhairallahCTOAuthor Commented:
Oops. I just noticed a typo with the router model, it's actually WRT54GS not GD :-/
George KhairallahCTOAuthor Commented:
Ok guys, so it's been 2 days. and the server hasn't gone down once. I have not touched the firmware, but I have adjusted the MTU values on the router, and that seemed to work. at least for now .

Now for the points, I gave tray_jones the majority of the points, because when you mentioned the large file size, it triggered the MTU problem for me. and 50 to mcsween, for leading me to the linksys :)

Now I'm hoping that this was indeed the answer. hopefully it'll keep working. of course I still don't understand why rebooting the server fixed the problem when the main issue with the MTU setting on the router. but nonetheless, I've never understood exactly how this whole MTU thing worked, and I'm glad the whole setup is working. Thanks again for your assistance.

Happy New Year!
tray_jonesCommented:
Look for information about the TCP Sliding window, it will give you some insight on how this works a little better, sorry we couldnt get you there any faster!!

If anything else comes up with it, you know where to find us.

Thanks
Happy New Year
Tray
George KhairallahCTOAuthor Commented:
Tray , Happy New Year! and thanks for the tip about the TCP sliding windows. after all, it looks like the problem didn't get resolved by changing the MTU. i haven't changed anything with the sliding window, for a couple of reasons
1- I'm pretty sure it's not the problem
2- if it is the problem, there's no way I can change it on the router... i can on the server, but at this point, it looks like figuring out the correct figure is the kinda complicated.

so I'm pretty much back at square one, as the server and the internet connection from the inside ended up going down again. this problem is baffling me, I've set up so many small networks before, and they've always worked flawlessly, I'm not sure what's happening with this one. I'm almost at the point where I want to try and rebuild the OS on the server and see if that fixes it, because I have a strong feeling that it's something on the server (possibly DNS ? ) ... that's breaking... ? I'm not a DNS expert by the way, and this is only my 3rd of 4th time setting up DNS on a network, so it is possible that something is wrong there... but I have no clue what at this point.

i'm not sure how to give more points for this question, but I would like to give an additional 250. I'll have to find out if I can still in this thread, otherwise, I'll open up a new question and link back to this point to assign more points for it.

Thanks again for all your help!
tray_jonesCommented:
Try lowering the MTU on the server somewhere around 1400, and maybe find an updated driver for the NIC, what is the NIC model and manufacturer?
George KhairallahCTOAuthor Commented:
Ok .. I'll go ahead and try to lower the MTU to 1400. The NIC is a Broadcom NetXtreme Gigabit BC57xx. I found that the current drive is from 01/05, and I also found that there's an update on 12/05. so next time I go to the office, I'll try and upgrade that drive to see if that works.

Another thing  I found out as I was tinkering around, is that for some reason the primary IP that DNS is seeing for the server is the VPN (Hamachi) IP, which is a 5.13.x.x , as opposed to the local ip, 192.168.200.10 .. I thought that was odd, but in any case, I added that IP as an IP option for the DNS server, and on DHCP.... not sure if that'll do anything, but it can't hurt anything.
tray_jonesCommented:
<-----Eager to see what happens.....
George KhairallahCTOAuthor Commented:
Tray,

Just thought I'd post an update as to what's happenin. unfortuantely, I didn't have the luxury to do trial and error on one thing at a time, so I'm not sure whether what I just did actually fixed it or now. basically I did 3 things:
1- I upgraded the NIC drivers on the server
2- I moved the DHCP server fro being on W2k3 to being on the Linksys router (there's only 3 clients, no big deal).
3- I turned off DNS, and termporarily switched to using host files to see if that is the problem.

I kept the VPN client on, and since yesterday morning, the server is still up. hooray. I'm thinking the problem was either in DNS or in the NIC driver.. I supposed I can turn back on DNS if that actually did resolve the problem, and see if the problem comes back. as I had said, I'm not an expert on DNS, so there might be very well something wrong with my setup.
tray_jonesCommented:
Hopefully everything stays up for ya....seems to me if it was DNS then there would never be an established connection in the first place...but DNS is a tricky beast....
George KhairallahCTOAuthor Commented:
Aaaahhh ... it went down again!!!
I guess it wasn't DNS , nor the NIC drivers after all. or maybe the NIC itself is bad... although I can still ping the server from the router, I just can't get a route to it from the outside. I'm completely baffled.

I guess the next thing I'm going to try is to take another machine to the office that I will try to keep online the same way the server does, and see if the same problem occurs on that machine. if it does, then I can safely say that the router is faulty  (Although I still haven't upgraded the firmware of the router, which is something I might want to do as well) ...

Any more ideas tray?? I'm stomped....
tray_jonesCommented:
yeah this seems to be pretty special.....I have had SOHO routers that after a whole lot of data transfer just lockup before so it could be the router.  Have you tried to RDP from a machine inside the network to the server and see if it stays up for a while? if it dies, server, if it dont, router.
George KhairallahCTOAuthor Commented:
Nope. the connection never dies from the inside . it always works.
Today,  I emailed my client and told them that the server was down, and that was going to bring another machine to put it up in parallel to see if the problem occurs there as well, but they said , that this time, they didn't lose internet connection, but yet, I can't access the server via RDP from the outside. so I'm thinking it might be the router. I guess I'll have to think of some other stuff to do. At this point, they don't seem to be interested in spending so much time fixing the problem,a nd prefer to just reboot the server, but at this point, it's mostly to satisfy my own curiosity.... :)

I'll keep you posted....
tray_jonesCommented:
What version is the Router, I was reading linksys' website when I stumbled onto this for version 4

Linksys, A division of Cisco Systems, Inc.

Product:                WRT54GS v.4

Classification:         Firmware Release History

Firmware  Date:            9/15/2005

Release Date:           10/5/2005

Last Firmware Version: 1.05.2
_____________________________________________________________________
Firmware 1.05.2
- Fix WDS under WPA mode issue
- Fixed issue where user can choose "Shared Key" under WPA-Personal and WPA-Enterprise.
- Fixed issue where long SES push generates new SSID and key instead of reset to defaults
- Fix iDEFENSE issue
- Add SES disable function in "Wireless Advanced" page
- Fixed SES LED delay
- Fixed issue where SES SSID/security is applied even when there is no SES clients
- Fix L2TP disconnect issue
- Fix DMZ and Port forwarding issue

Firmware 1.05
- Initial release for version 4.

NOTE:  This firmware is not compatible with WRT54GS hardware version 1.0, 1.1, 2.0, 2.1 or 3.0.  Please check the bottom of your router to ensure you have the correct hardware version.

Notice the "- Fix DMZ and Port forwarding issue"

Maybe it was the router all along
George KhairallahCTOAuthor Commented:
Tray,

Good find! I actually forgot to check the version number of the router on my last visit. next time I go there I'll have to check that, and I think I will be updating the firmware as well. of course the changelog doesn't seem to show WHAT they fixed with the DMZ and Port Fowarding, but it sure is a good guess that it might be the problem that we're seeing. also, I guess another thing I can try is to temporarily put the server in the DMZ and see if anything changes at all. it does look like it's a problem with port forwarding and the gateway (which is also the router), because sometimes they also lose internet connection. and the fact that everything internally is always up, tells me that it is the router. Of course, I'm still not sure why rebooting the SERVER fixes the issue... ( ??? )
George KhairallahCTOAuthor Commented:
Tray. I don't know if you're still out there.... but I wanted to see if you have any ideas here... I think I narrowed down the problem a bit. i ended up upgrading the firmware and everything on the router. and yesterday, I tried to do an nmap on the public IP, and I only got back one port open, which is the remote management interface for the router.

I was supposed to also get port 3389 and 5900 opened up to the server.
I was doing this, while the server was down. so I went into the server, and I didn't even restart it, all I did was open IE  and try to go to a website, it thought for a little bit, and after that, went right to the website, and suddenly the connection to the server was back on. my nmap results also showed all the ports that were open. so I know for a fact that there's  a problem with the server, and not the router now.

It seems as though there's something that's making the NIC go to standby or something similar. I just went to the NIC properties, and looked in the power management, and I saw this option checked:
Allow the computer to turn off this device to save power.

I'm not sure when's the timing for this, but my guess is that the server was turning off the NIC after a while.. and only activity from with the server would allow it to wake up again? .. in any case, I unchecked this and hopefully this will resolve the problem.

If you have any other ideas, please feel free to contribute :) I'm completely stomped ...
 
tray_jonesCommented:
That Very easily could have been the problem. I cant seem to find anything about when that does it or how often.  Eager to see what happens with it thoough, it sounds like that could be the problem.
George KhairallahCTOAuthor Commented:
I was thinking back to this problem, and I had a thought that might mean that the power management wasn't the problem.
When the server was down, it wasn't able to access the internet from the console, and as soon as I tried to create activity on the NIC, everything was back online. however, I did this by connecting to the server via RDP, from within the LAN, this (duh) obviously means that the NIC was actually up... otherwise I wouldn't have been able to connect. so i'm really hoping that this actually was the problem. but this last observation leads me to believe that it's not.

If the server goes back down, I'm going to be trying to bring up the second NIC on the server, and connect it to a second IP, and see if both NICs go down when the server goes down. and in this manner, I'm going to open up RDP port to the IP of the first NIC, and open up the VNC port to the IP of the second NIC. this way I'd be ale to narrow down the problem to whether it's a problem with the network card or something else. I'm at least glad that I know that the router is functioning ok. making progress, only very slow progress! :(
George KhairallahCTOAuthor Commented:
DOPE!! the server went back down today!! time to try setting up the second NIC ...

I'm totally stomped.. I'm also having thoughts about trying to rebuild the OS. I'm thinking something might just be screwy in that build... (??)
George KhairallahCTOAuthor Commented:
After testing the NIC and enabling the second one as well. the server still went down, so now I know that it's not the network card that is the problem.
next step for me is just going to be to rebuild the OS, because now I know it's something in the OS that's screwed up. I was hoping it didn't come to that, but I don't have many more options.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2003

From novice to tech pro — start learning today.