?
Solved

SBS 2003 / Server 2003 intermittently refuses all outside connections

Posted on 2010-09-14
22
Medium Priority
?
1,407 Views
Last Modified: 2012-06-21

Hi there,

I'm having trouble with a Small Business Server 2003 machine intermittently refusing all outside connections.  The problem is sporadic and I haven't found anything in the logs that points to a culprit.  Everything will be running just fine and then suddenly all outside RDP sessions are killed and the server will refuse connections on port 25 and 443.  This brings mail/webmail to a screeching halt.

It appeared to be related to DNS (ie, server could not do reverse lookups on IPs of incoming connections), so I made some changes based on some research.  The problem appeared to be fixed, but it keeps coming back.
 
Changes made so far:
PIX Firewall
1. Changed maximum DNS packet size from 512 to 1518

SBS 2003 Server
in HKLM\System\CurrentControlSet\Servers\DNS\Parameters
1. Added key: EnableEDNSProbes, Reg_DWORD = 0x00000000
2. Added key: EDNSCacheTimeout, Reg_DWORD = 0x00057e40

I've also verified that no other processes are using any ports needed by IPSec.

The problem still happens and lasts for intermittent amounts of time.  I have 3 other environments identical to this one (SBS 2003 behind a PIX firewall) that aren't having this issue.

Any help would be greatly appreciated.

VHCG
0
Comment
Question by:vhcg
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
22 Comments
 

Author Comment

by:vhcg
ID: 33677710
Wanted to add that all incoming connections from inside hosts are always accepted.  It's only stuff from the outside that has trouble.
0
 

Expert Comment

by:gounbas
ID: 33677768
If I am reading correctly, you mean connections from outside of your network to your SBS 2003?
Are you using Remote Web Workplace for the RDP connections?
Have you tried to Telnet to port 25 after the failure occurs?
0
 
LVL 5

Expert Comment

by:Ioannis_Avgeros
ID: 33677786
Hi there,

is that the only traffic that gets dropped on its way to the server via the pix? If for example you run an ftp server on the server would that connection drop along with the others?

0
Three Reasons Why Backup is Strategic

Backup is strategic to your business because your data is strategic to your business. Without backup, your business will fail. This white paper explains why it is vital for you to design and immediately execute a backup strategy to protect 100 percent of your data.

 

Author Comment

by:vhcg
ID: 33677840

When "the problem" flairs up, all active connections from the outside are terminated (usually my RDP connection -- I manage it remotely) and HTTPS connections (Users using OWA).

My quick and dirty test is to  telnet to port 25 from the outside.  When things are working, the server responds.  When things are not working, there is no response...no response either on any of the other ports that I'm allowing through (443, 3389).

Thanks!
0
 
LVL 34

Expert Comment

by:Shreedhar Ette
ID: 33677845
- Run SBS 2003 Best Practise Anlyser tool and fix the errors reported.

- Check the System Eventlog of the server for the error events from the source Srv.
0
 
LVL 3

Expert Comment

by:woodmouse
ID: 33678054
Also check if RRAS isn't doing some stupid things...
We had some issues on an SBS 2003 yesterday, and oddly enough I could access the network from the SBS server, but the clients couldn't access the server at all (nor a ping, nor telnet !).

Restarting the server, didn't help...

Removing RRAS, and restarting the server again - did resolve the issue (we didn't need RRAS anyway).
0
 
LVL 77

Expert Comment

by:Rob Williams
ID: 33678413
Not suggesting it is not the SBS but I have seen this happen with bad routers and bad modems. Have you tried rebooting on or the other when the problem exists to see if connections are quickly restored?
0
 

Author Comment

by:vhcg
ID: 33703821
Well, the problem still persists.

Based on all of the feedback (thank you), I did the following:

1. Ran the SBS Best Practice Analyser -- no major problems found.
2. Stopped the RRAS server (but haven't removed it)

The problem is definitely worse during business hours when the server is more busy.

Doing my quick test of telnet to port 25, I'm seeing two different behaviors.

1. The connection is refused right away
2. The connection times out
3. Scoured through the event logs  -- nothing found

I'm fairly certain that it is not the PIX firewall since there are several servers behind it and I am able to access all of them without any trouble.

VHCG

0
 

Author Comment

by:vhcg
ID: 33704005

DNS appears to be part of the picture.  The problem is happening right now and I can't do external lookups (by name or by IP) using nslookup on the SBS server.  I've stopped and started the DNS Server  service, but the lookups still timeout.  As forwarders, I'm using the 2 servers supplied to me by my ISP.  I know they are working because when I configure another server (which has a different public IP) on the network to use the ISP DNS servers, the lookups work just fine.

This is maddening...

VHCG
0
 
LVL 77

Expert Comment

by:Rob Williams
ID: 33705055
I doubt DNS is the issue, but rather a side effect. PC's would use the SBS for DNS, and external DNS domains would be resolved using forwarders. If the internet were not available, DNS could not resolve the names, as it could not access the forwarders.

Based on your updated information about the other servers it seems apparent there is a disconnect between the SBS and the router.

It also is not likely a software issue where it is intermittent.

I assume connections between the PC's and server are maintained, just internet is lost?
One NIC or 2 on the SBS? If 2 it could be a bad NIC or driver.
Also if 2 NIC's can you change the patch cable for the WAN NIC and switch port? Patch cables are the #1 point of network failures and you can have bad switch ports locking up.
If you set either the NIC or switch port to a fixed speed and/or duplex and leave the other as auto, you can have the port freeze. Both have to be set the same, generally auto is best.

It is also possible you have a virus on the server which is consuming all bandwidth for a period of time.
0
 

Author Comment

by:vhcg
ID: 33705294

As an (awful) workaround, the PCs are currently setup using the SBS server as the primary DNS server and the ISP's DNS servers a the secondary and tertiary servers.  I know this is not optimal and not recommended, but every time DNS would flake out on the SBS machine, the whole company would lose their Internet access and there were some very angry/upset people.  

There are 2 NICs in the SBS, but one of them is currently disabled.  I will give the network settings suggestion a try.

Also, regarding the virus.  I've thought about that, too.  The server is protected via Trend Micro, but I've seen stuff slip on (on the workstations) that Trend Micro was not able to catch.  Can anyone recommend a site where I can do a free scan of the server as a sanity check?  Over the weekend, things will be quiet and I should be able to work on the server without interruption.  

Thanks again for all of the input.  I appreciate it very much.

VHCG
0
 
LVL 77

Expert Comment

by:Rob Williams
ID: 33705422
>>"As an (awful) workaround"
That is an understatement :-) The #1 mistake made with DNS in a Windows domain is to have a router or ISP added as a second or theirs DNS server. DNS WILL FAIL. DNS in windows does not work as one would expect. You would think a DNS request would be resolved by the first listed server, if it fails, move to the second, and so forth. However it soes not work that way. As described by another expert here, it is more of a shotgun affair with the first one to respond being accepted. Thus if an ISP responds first, which is often the case, your DNS resolve will fail.
This results in all sorts of network related issues and performance issue, many of which seem totally unrelated to DNS.
I have never understood if the SBS is off line, and thus file, printer, and e-mail access, why Internet access is so important. However that is an end user issue not yours and mine :-)

Regardless this doesn't sound like DNS. Even with the mis-configuration the external connections should still remain connected.

As for the "network settings", as mentioned best to have NIC and switch (if a managed switch) set to automatic.

For the virus you might want to first try  netstat -an   from a command line and see if there any "established" connections you cannot explain. As for on-line scans I have used House Call, but it is by Trend Micro so it probably uses the same database  http://housecall.trendmicro.com/?WT.seg_2=2009HP_HouseCall
You could download free Malwarebytes. It works well  www.malwarebytes.com.
If you suspect a root kit you could try Gmer, but it tends to require more user intervention/scrutinizing.
Posting a question in the Anti-virus zone might provide better suggestions for that: http://www.experts-exchange.com/Internet/Anti-Virus/
0
 

Author Comment

by:vhcg
ID: 33713957

Thank you for the continued suggestions.

I ran 'netstat' as recommended and see lots of http connections that I don't quite understand.  http is not open to the Internet on this server.  The only ports open to the Internet are ports 25 (to all), 443 (to all), and 3389 (to me and a few other select trusted sites).

  TCP    mailsrv:9578           208.71.123.131:http    ESTABLISHED
  TCP    mailsrv:9579           nuq04s01-in-f165.1e100.net:http  TIME_WAIT
  TCP    mailsrv:9580           204.2.133.99:http      ESTABLISHED
  TCP    mailsrv:9582           65.49.92.114:http      ESTABLISHED
  TCP    mailsrv:9583           208.71.125.133:http    ESTABLISHED
  TCP    mailsrv:9585           nuq04s01-in-f156.1e100.net:http  ESTABLISHED
  TCP    mailsrv:9589           nuq04s01-in-f165.1e100.net:http  TIME_WAIT
  TCP    mailsrv:9591           scaler01-cts.netline.com:http  TIME_WAIT
  TCP    mailsrv:9597           nuq04s01-in-f148.1e100.net:http  TIME_WAIT
  TCP    mailsrv:9599           65.49.92.129:http      TIME_WAIT
  TCP    mailsrv:9600           65.49.92.242:http      ESTABLISHED
  TCP    mailsrv:9601           65.49.92.242:http      ESTABLISHED
  TCP    mailsrv:9602           65.49.92.242:http      ESTABLISHED
  TCP    mailsrv:9603           65.49.92.242:http      ESTABLISHED
  TCP    mailsrv:9604           65.49.92.242:http      ESTABLISHED
  TCP    mailsrv:9605           65.49.92.242:http      ESTABLISHED

I don't see the same behavior at other sites that are setup the same way.  

Thanks,
VHCG
0
 
LVL 77

Expert Comment

by:Rob Williams
ID: 33717020
These are outgoing connections which are allowed by default.
Http would imply, but not necessarily, mean they are web pages connecting to a site or service.
Might this be the case?
65.49.92.242 is in Freemont California, and 208.71.123.131/204.2.133.99 in Wichita, but no idea what they are.
They could also be a service that updates like a DDNS service, or even Windows updates, but I am not sure of what ports they use.
As for ____1e100.net see http://www.pcmech.com/article/the-mysterious-1e100-net/
0
 

Author Comment

by:vhcg
ID: 33721611

Ok, I had the server moved.  There was quite a bit of trouble with it today.

Old Config:
Internet <-> PIX 506e <-> Cisco SW #1 <-> Cisco SW #2 <-> SBS Server

New Config:
Internet <-> PIX 506e <-> Netgear "dumb" L2 switch <-> SBS Server

I wanted to eliminate any possibility of "funny stuff" happening on the Cisco Switches (a pair of 3560Gs).

The complete absense of anything in the Event Viewer is baffling and makes me think that whatever is happening is external to the server.  If services were randomly shutting off and connections were randomly being refused, I would expect to see at least *something* in the Event Viewer.  There has been absolutely nothing beyond routine Informational messages.

VHCG

0
 
LVL 77

Expert Comment

by:Rob Williams
ID: 33724572
Can you confirm the following:
-Server has 1 NIC. Second is disabled, not just disconnected or not configured.
-When the problem exists the server and clients cannot connect to the Internet but local clients can still access file shares on the SBS
-When the problem exists external connections such as incoming e-mail, and RDP sessions are dropped
-When the problem exists can the SBS or a client machine access a web page using the IP (by passing DNS), such as Google  http://173.194.32.104

This actually sounds like the server has two NIC's, configured with two gateways and one not connected. Windows will switch to the gateway with the lower metric if the first is lost for even a split second, but it does not switch back.
0
 

Author Comment

by:vhcg
ID: 33729698

RobWill:

All of what you listed is confirmed.  When the problem is present, the client machines can talk to the SBS server and it can talk back, but the SBS server cannot do anything beyond communication on the local network.  It's like the default gateway is suddenly gone/forgotten.

However, unless there is something hidden somewhere, I do not see another default gateway.

I've attached some debugging commands from the command prompt.

The SBS server has IP address 192.160.30.10.  There is a Server 2003 machine on the network with IP address 192.168.30.11.  The PIX is the default gateway with IP address 192.168.30.1.

The server is currently inaccessible, but when it does come back up, I can put WireShark on it and do some packet captures.

Thanks,
VHCG

C:\WINDOWS\system32\drivers\etc>ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : xxxxxx
   Primary Dns Suffix  . . . . . . . : abcdefg.local
   Node Type . . . . . . . . . . . . : Unknown
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : Yes
   DNS Suffix Search List. . . . . . : abcdefg.local

Ethernet adapter Server Local Area Connection:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Broadcom NetXtreme Gigabit Ethernet
   Physical Address. . . . . . . . . : 00-C0-9F-BA-C5-4E
   DHCP Enabled. . . . . . . . . . . : No
   IP Address. . . . . . . . . . . . : 192.168.30.10
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.30.1
   DNS Servers . . . . . . . . . . . : 192.168.30.10
   Primary WINS Server . . . . . . . : 192.168.30.10

C:\WINDOWS\system32\drivers\etc>netstat -r

IPv4 Route Table
===========================================================================
Interface List
0x1 ........................... MS TCP Loopback interface
0x10003 ...00 c0 9f ba c5 4e ...... Broadcom NetXtreme Gigabit Ethernet
===========================================================================
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0     192.168.30.1    192.168.30.10      1
        127.0.0.0        255.0.0.0        127.0.0.1        127.0.0.1      1
     192.168.30.0    255.255.255.0    192.168.30.10    192.168.30.10     20
    192.168.30.10  255.255.255.255        127.0.0.1        127.0.0.1     20
   192.168.30.255  255.255.255.255    192.168.30.10    192.168.30.10     20
        224.0.0.0        240.0.0.0    192.168.30.10    192.168.30.10     20
  255.255.255.255  255.255.255.255    192.168.30.10    192.168.30.10      1
Default Gateway:      192.168.30.1
===========================================================================
Persistent Routes:
  None

C:\WINDOWS\system32\drivers\etc>ping 192.220.109.142

Pinging 192.220.109.142 with 32 bytes of data:

Request timed out.

Ping statistics for 192.220.109.142:
    Packets: Sent = 1, Received = 0, Lost = 1 (100% loss),
Control-C
^C
C:\WINDOWS\system32\drivers\etc>tracert 192.220.109.142

Tracing route to 192.220.109.142 over a maximum of 30 hops

  1     *        *        *     Request timed out.
  2     *        *        *     Request timed out.
  3     *        *        *     Request timed out.
  4     *        *        *     Request timed out.
  5     *        *        *     Request timed out.
  6     *        *        *     Request timed out.
  7     *        *        *     Request timed out.
  8     *        *        *     Request timed out.
  9     *        *        *     Request timed out.
 10     *        *        *     Request timed out.
 11     *        *     ^C
C:\WINDOWS\system32\drivers\etc>

C:\WINDOWS\system32\drivers\etc>ping 192.168.30.11

Pinging 192.168.30.11 with 32 bytes of data:

Reply from 192.168.30.11: bytes=32 time<1ms TTL=128
Reply from 192.168.30.11: bytes=32 time<1ms TTL=128
Reply from 192.168.30.11: bytes=32 time<1ms TTL=128
Reply from 192.168.30.11: bytes=32 time<1ms TTL=128

Ping statistics for 192.168.30.11:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 0ms, Average = 0ms

C:\WINDOWS\system32\drivers\etc>ping 192.168.30.1

Pinging 192.168.30.1 with 32 bytes of data:

Reply from 192.168.30.1: bytes=32 time<1ms TTL=255
Reply from 192.168.30.1: bytes=32 time<1ms TTL=255
Reply from 192.168.30.1: bytes=32 time<1ms TTL=255
Reply from 192.168.30.1: bytes=32 time=1ms TTL=255

Ping statistics for 192.168.30.1:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 1ms, Average = 0ms

Open in new window

0
 
LVL 77

Expert Comment

by:Rob Williams
ID: 33730903
That certainly all looks OK.
Wireshark may tell you something but it takes a while to filter out what you are looking for, especially when you don't know what you are looking for :-)

If I were to stick on the default gateway theory..
Try opening the registry editor and go to:
HKLM\System\CurentControlSet\Services|TCPIP\Parameters\Interfaces\
Look through the settings in each of the subfolders and see if there is a gateway listed that is wrong. Ignore any that are 0.0.0.0 or ASCII. I don't know that i would recommend at this point deleting, but if present that subfolder maybe a remnant of an old/Ghost NIC.

0
 

Author Comment

by:vhcg
ID: 33738185

RobWill:  The registry entries check out ok.

All was fine last night, but trouble has started again this morning.  I checked the firewall, did packet captures, checked the DSL modem, etc, etc.

Rebooting the server has temporarily fixed it.  I don't know how long this will last since the server was rebooted at 8am this morning to apply a couple of patches (IE8, Latest 'Malicious Software Removal' tool).  

It sure feels like an intermittent loss of the default gateway, but the system shows otherwise.  I've disabled 'Routing and Remote Access' to remove it from the picture.  No dice there.  Could IPSec be silently 'intervening' at times ?

When the system is not able to get to the outside (or let the outside in) Is there a particular thing I can try while I have Wireshark running?  So far, capturing of pings, traceroutes, etc, just shows that a packet leaves the server and does not get an answer.

Thanks,
VHCG

0
 
LVL 77

Expert Comment

by:Rob Williams
ID: 33739257
You mention RRAS. Do you use the VPN feature of RRAS? If so I wonder if it could be using a static address pool that conflicts somehow, or I have seen DNS "confused" by an incorrect VPN configuration while a user is connected.

I don't know what you would look for with wireshark while the connection is lost. Usually you take a capture and then start filtering out the traffic you can confirm is acceptable and look at what is left. But it may not show anything.

Another completely different thought is the possibility of an outside denial of service attack. The most common occurs when you allow/reply to ICMP (ping requests) from the Internet. This is off by default with the PIX, but might that be enabled?
0
 

Accepted Solution

by:
vhcg earned 0 total points
ID: 33739264

I think I got it.

I've been checking the configs of all of the network devices and when I looked at the status of the DSL modem (which also has a 4 port switch) built-in, I saw *two* ports active.  Only one (the PIX) should be active.  

The modem has a label on it the IP address, netmask, and default gateway entries.

Someone jacked into one of the ports and setup himself up with a static IP...the SAME IP as the public IP of the SBS Server.

I thought I had this licked numerous times before, but I'm fairly certain that I've got it for sure this time.  

I'll report back in a day.

Thanks to all who responded (especially you, RobWill for hanging in there).

VHCG
0
 
LVL 77

Expert Comment

by:Rob Williams
ID: 33739831
>>"Someone jacked into one of the ports and setup himself up with a static IP...the SAME IP as the public IP of the SBS Server."
That would definitely create havoc. It can lock up the modem if not the router.

I have seen this at universities with students trying to use protected networks. They get an IP off an allowed machine and clone it. Better yet they often clone the MAC. Switches love that :-)

Thanks for updating. Hopefully you are on to something.
0

Featured Post

Bringing Advanced Authentication to the SMB Market

WatchGuard announces the acquisition of advanced authentication provider, Datablink, with one mission – to bring secure authentication to SMB, mid-market, and distributed enterprises with a cloud-based solution, ideal for resale via their established channel & MSSP community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I wrote this article to explain some important DNS concepts that should be known to avoid some typical configuration errors I often see in forums. I assume that what is described here is the typical behavior of Microsoft DNS client. I don't know …
You may have discovered the 'Compatibility View Settings' workaround for making your SBS 2008 Remote Web Workplace 'connect to a computer' section stops 'working around' after a Windows 10 client upgrade.  That can be fixed so it 'works around' agai…
In this video you will find out how to export Office 365 mailboxes using the built in eDiscovery tool. Bear in mind that although this method might be useful in some cases, using PST files as Office 365 backup is troublesome in a long run (more on t…
Visualize your data even better in Access queries. Given a date and a value, this lesson shows how to compare that value with the previous value, calculate the difference, and display a circle if the value is the same, an up triangle if it increased…
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question