Solved

SBS 2003 / Server 2003 intermittently refuses all outside connections

Posted on 2010-09-14
22
1,386 Views
Last Modified: 2012-06-21

Hi there,

I'm having trouble with a Small Business Server 2003 machine intermittently refusing all outside connections.  The problem is sporadic and I haven't found anything in the logs that points to a culprit.  Everything will be running just fine and then suddenly all outside RDP sessions are killed and the server will refuse connections on port 25 and 443.  This brings mail/webmail to a screeching halt.

It appeared to be related to DNS (ie, server could not do reverse lookups on IPs of incoming connections), so I made some changes based on some research.  The problem appeared to be fixed, but it keeps coming back.
 
Changes made so far:
PIX Firewall
1. Changed maximum DNS packet size from 512 to 1518

SBS 2003 Server
in HKLM\System\CurrentControlSet\Servers\DNS\Parameters
1. Added key: EnableEDNSProbes, Reg_DWORD = 0x00000000
2. Added key: EDNSCacheTimeout, Reg_DWORD = 0x00057e40

I've also verified that no other processes are using any ports needed by IPSec.

The problem still happens and lasts for intermittent amounts of time.  I have 3 other environments identical to this one (SBS 2003 behind a PIX firewall) that aren't having this issue.

Any help would be greatly appreciated.

VHCG
0
Comment
Question by:vhcg
22 Comments
 

Author Comment

by:vhcg
Comment Utility
Wanted to add that all incoming connections from inside hosts are always accepted.  It's only stuff from the outside that has trouble.
0
 

Expert Comment

by:gounbas
Comment Utility
If I am reading correctly, you mean connections from outside of your network to your SBS 2003?
Are you using Remote Web Workplace for the RDP connections?
Have you tried to Telnet to port 25 after the failure occurs?
0
 
LVL 5

Expert Comment

by:Ioannis_Avgeros
Comment Utility
Hi there,

is that the only traffic that gets dropped on its way to the server via the pix? If for example you run an ftp server on the server would that connection drop along with the others?

0
 

Author Comment

by:vhcg
Comment Utility

When "the problem" flairs up, all active connections from the outside are terminated (usually my RDP connection -- I manage it remotely) and HTTPS connections (Users using OWA).

My quick and dirty test is to  telnet to port 25 from the outside.  When things are working, the server responds.  When things are not working, there is no response...no response either on any of the other ports that I'm allowing through (443, 3389).

Thanks!
0
 
LVL 34

Expert Comment

by:Shreedhar Ette
Comment Utility
- Run SBS 2003 Best Practise Anlyser tool and fix the errors reported.

- Check the System Eventlog of the server for the error events from the source Srv.
0
 
LVL 3

Expert Comment

by:woodmouse
Comment Utility
Also check if RRAS isn't doing some stupid things...
We had some issues on an SBS 2003 yesterday, and oddly enough I could access the network from the SBS server, but the clients couldn't access the server at all (nor a ping, nor telnet !).

Restarting the server, didn't help...

Removing RRAS, and restarting the server again - did resolve the issue (we didn't need RRAS anyway).
0
 
LVL 77

Expert Comment

by:Rob Williams
Comment Utility
Not suggesting it is not the SBS but I have seen this happen with bad routers and bad modems. Have you tried rebooting on or the other when the problem exists to see if connections are quickly restored?
0
 

Author Comment

by:vhcg
Comment Utility
Well, the problem still persists.

Based on all of the feedback (thank you), I did the following:

1. Ran the SBS Best Practice Analyser -- no major problems found.
2. Stopped the RRAS server (but haven't removed it)

The problem is definitely worse during business hours when the server is more busy.

Doing my quick test of telnet to port 25, I'm seeing two different behaviors.

1. The connection is refused right away
2. The connection times out
3. Scoured through the event logs  -- nothing found

I'm fairly certain that it is not the PIX firewall since there are several servers behind it and I am able to access all of them without any trouble.

VHCG

0
 

Author Comment

by:vhcg
Comment Utility

DNS appears to be part of the picture.  The problem is happening right now and I can't do external lookups (by name or by IP) using nslookup on the SBS server.  I've stopped and started the DNS Server  service, but the lookups still timeout.  As forwarders, I'm using the 2 servers supplied to me by my ISP.  I know they are working because when I configure another server (which has a different public IP) on the network to use the ISP DNS servers, the lookups work just fine.

This is maddening...

VHCG
0
 
LVL 77

Expert Comment

by:Rob Williams
Comment Utility
I doubt DNS is the issue, but rather a side effect. PC's would use the SBS for DNS, and external DNS domains would be resolved using forwarders. If the internet were not available, DNS could not resolve the names, as it could not access the forwarders.

Based on your updated information about the other servers it seems apparent there is a disconnect between the SBS and the router.

It also is not likely a software issue where it is intermittent.

I assume connections between the PC's and server are maintained, just internet is lost?
One NIC or 2 on the SBS? If 2 it could be a bad NIC or driver.
Also if 2 NIC's can you change the patch cable for the WAN NIC and switch port? Patch cables are the #1 point of network failures and you can have bad switch ports locking up.
If you set either the NIC or switch port to a fixed speed and/or duplex and leave the other as auto, you can have the port freeze. Both have to be set the same, generally auto is best.

It is also possible you have a virus on the server which is consuming all bandwidth for a period of time.
0
 

Author Comment

by:vhcg
Comment Utility

As an (awful) workaround, the PCs are currently setup using the SBS server as the primary DNS server and the ISP's DNS servers a the secondary and tertiary servers.  I know this is not optimal and not recommended, but every time DNS would flake out on the SBS machine, the whole company would lose their Internet access and there were some very angry/upset people.  

There are 2 NICs in the SBS, but one of them is currently disabled.  I will give the network settings suggestion a try.

Also, regarding the virus.  I've thought about that, too.  The server is protected via Trend Micro, but I've seen stuff slip on (on the workstations) that Trend Micro was not able to catch.  Can anyone recommend a site where I can do a free scan of the server as a sanity check?  Over the weekend, things will be quiet and I should be able to work on the server without interruption.  

Thanks again for all of the input.  I appreciate it very much.

VHCG
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 77

Expert Comment

by:Rob Williams
Comment Utility
>>"As an (awful) workaround"
That is an understatement :-) The #1 mistake made with DNS in a Windows domain is to have a router or ISP added as a second or theirs DNS server. DNS WILL FAIL. DNS in windows does not work as one would expect. You would think a DNS request would be resolved by the first listed server, if it fails, move to the second, and so forth. However it soes not work that way. As described by another expert here, it is more of a shotgun affair with the first one to respond being accepted. Thus if an ISP responds first, which is often the case, your DNS resolve will fail.
This results in all sorts of network related issues and performance issue, many of which seem totally unrelated to DNS.
I have never understood if the SBS is off line, and thus file, printer, and e-mail access, why Internet access is so important. However that is an end user issue not yours and mine :-)

Regardless this doesn't sound like DNS. Even with the mis-configuration the external connections should still remain connected.

As for the "network settings", as mentioned best to have NIC and switch (if a managed switch) set to automatic.

For the virus you might want to first try  netstat -an   from a command line and see if there any "established" connections you cannot explain. As for on-line scans I have used House Call, but it is by Trend Micro so it probably uses the same database  http://housecall.trendmicro.com/?WT.seg_2=2009HP_HouseCall
You could download free Malwarebytes. It works well  www.malwarebytes.com.
If you suspect a root kit you could try Gmer, but it tends to require more user intervention/scrutinizing.
Posting a question in the Anti-virus zone might provide better suggestions for that: http://www.experts-exchange.com/Internet/Anti-Virus/
0
 

Author Comment

by:vhcg
Comment Utility

Thank you for the continued suggestions.

I ran 'netstat' as recommended and see lots of http connections that I don't quite understand.  http is not open to the Internet on this server.  The only ports open to the Internet are ports 25 (to all), 443 (to all), and 3389 (to me and a few other select trusted sites).

  TCP    mailsrv:9578           208.71.123.131:http    ESTABLISHED
  TCP    mailsrv:9579           nuq04s01-in-f165.1e100.net:http  TIME_WAIT
  TCP    mailsrv:9580           204.2.133.99:http      ESTABLISHED
  TCP    mailsrv:9582           65.49.92.114:http      ESTABLISHED
  TCP    mailsrv:9583           208.71.125.133:http    ESTABLISHED
  TCP    mailsrv:9585           nuq04s01-in-f156.1e100.net:http  ESTABLISHED
  TCP    mailsrv:9589           nuq04s01-in-f165.1e100.net:http  TIME_WAIT
  TCP    mailsrv:9591           scaler01-cts.netline.com:http  TIME_WAIT
  TCP    mailsrv:9597           nuq04s01-in-f148.1e100.net:http  TIME_WAIT
  TCP    mailsrv:9599           65.49.92.129:http      TIME_WAIT
  TCP    mailsrv:9600           65.49.92.242:http      ESTABLISHED
  TCP    mailsrv:9601           65.49.92.242:http      ESTABLISHED
  TCP    mailsrv:9602           65.49.92.242:http      ESTABLISHED
  TCP    mailsrv:9603           65.49.92.242:http      ESTABLISHED
  TCP    mailsrv:9604           65.49.92.242:http      ESTABLISHED
  TCP    mailsrv:9605           65.49.92.242:http      ESTABLISHED

I don't see the same behavior at other sites that are setup the same way.  

Thanks,
VHCG
0
 
LVL 77

Expert Comment

by:Rob Williams
Comment Utility
These are outgoing connections which are allowed by default.
Http would imply, but not necessarily, mean they are web pages connecting to a site or service.
Might this be the case?
65.49.92.242 is in Freemont California, and 208.71.123.131/204.2.133.99 in Wichita, but no idea what they are.
They could also be a service that updates like a DDNS service, or even Windows updates, but I am not sure of what ports they use.
As for ____1e100.net see http://www.pcmech.com/article/the-mysterious-1e100-net/
0
 

Author Comment

by:vhcg
Comment Utility

Ok, I had the server moved.  There was quite a bit of trouble with it today.

Old Config:
Internet <-> PIX 506e <-> Cisco SW #1 <-> Cisco SW #2 <-> SBS Server

New Config:
Internet <-> PIX 506e <-> Netgear "dumb" L2 switch <-> SBS Server

I wanted to eliminate any possibility of "funny stuff" happening on the Cisco Switches (a pair of 3560Gs).

The complete absense of anything in the Event Viewer is baffling and makes me think that whatever is happening is external to the server.  If services were randomly shutting off and connections were randomly being refused, I would expect to see at least *something* in the Event Viewer.  There has been absolutely nothing beyond routine Informational messages.

VHCG

0
 
LVL 77

Expert Comment

by:Rob Williams
Comment Utility
Can you confirm the following:
-Server has 1 NIC. Second is disabled, not just disconnected or not configured.
-When the problem exists the server and clients cannot connect to the Internet but local clients can still access file shares on the SBS
-When the problem exists external connections such as incoming e-mail, and RDP sessions are dropped
-When the problem exists can the SBS or a client machine access a web page using the IP (by passing DNS), such as Google  http://173.194.32.104

This actually sounds like the server has two NIC's, configured with two gateways and one not connected. Windows will switch to the gateway with the lower metric if the first is lost for even a split second, but it does not switch back.
0
 

Author Comment

by:vhcg
Comment Utility

RobWill:

All of what you listed is confirmed.  When the problem is present, the client machines can talk to the SBS server and it can talk back, but the SBS server cannot do anything beyond communication on the local network.  It's like the default gateway is suddenly gone/forgotten.

However, unless there is something hidden somewhere, I do not see another default gateway.

I've attached some debugging commands from the command prompt.

The SBS server has IP address 192.160.30.10.  There is a Server 2003 machine on the network with IP address 192.168.30.11.  The PIX is the default gateway with IP address 192.168.30.1.

The server is currently inaccessible, but when it does come back up, I can put WireShark on it and do some packet captures.

Thanks,
VHCG

C:\WINDOWS\system32\drivers\etc>ipconfig /all



Windows IP Configuration



   Host Name . . . . . . . . . . . . : xxxxxx

   Primary Dns Suffix  . . . . . . . : abcdefg.local

   Node Type . . . . . . . . . . . . : Unknown

   IP Routing Enabled. . . . . . . . : No

   WINS Proxy Enabled. . . . . . . . : Yes

   DNS Suffix Search List. . . . . . : abcdefg.local



Ethernet adapter Server Local Area Connection:



   Connection-specific DNS Suffix  . :

   Description . . . . . . . . . . . : Broadcom NetXtreme Gigabit Ethernet

   Physical Address. . . . . . . . . : 00-C0-9F-BA-C5-4E

   DHCP Enabled. . . . . . . . . . . : No

   IP Address. . . . . . . . . . . . : 192.168.30.10

   Subnet Mask . . . . . . . . . . . : 255.255.255.0

   Default Gateway . . . . . . . . . : 192.168.30.1

   DNS Servers . . . . . . . . . . . : 192.168.30.10

   Primary WINS Server . . . . . . . : 192.168.30.10



C:\WINDOWS\system32\drivers\etc>netstat -r



IPv4 Route Table

===========================================================================

Interface List

0x1 ........................... MS TCP Loopback interface

0x10003 ...00 c0 9f ba c5 4e ...... Broadcom NetXtreme Gigabit Ethernet

===========================================================================

===========================================================================

Active Routes:

Network Destination        Netmask          Gateway       Interface  Metric

          0.0.0.0          0.0.0.0     192.168.30.1    192.168.30.10      1

        127.0.0.0        255.0.0.0        127.0.0.1        127.0.0.1      1

     192.168.30.0    255.255.255.0    192.168.30.10    192.168.30.10     20

    192.168.30.10  255.255.255.255        127.0.0.1        127.0.0.1     20

   192.168.30.255  255.255.255.255    192.168.30.10    192.168.30.10     20

        224.0.0.0        240.0.0.0    192.168.30.10    192.168.30.10     20

  255.255.255.255  255.255.255.255    192.168.30.10    192.168.30.10      1

Default Gateway:      192.168.30.1

===========================================================================

Persistent Routes:

  None



C:\WINDOWS\system32\drivers\etc>ping 192.220.109.142



Pinging 192.220.109.142 with 32 bytes of data:



Request timed out.



Ping statistics for 192.220.109.142:

    Packets: Sent = 1, Received = 0, Lost = 1 (100% loss),

Control-C

^C

C:\WINDOWS\system32\drivers\etc>tracert 192.220.109.142



Tracing route to 192.220.109.142 over a maximum of 30 hops



  1     *        *        *     Request timed out.

  2     *        *        *     Request timed out.

  3     *        *        *     Request timed out.

  4     *        *        *     Request timed out.

  5     *        *        *     Request timed out.

  6     *        *        *     Request timed out.

  7     *        *        *     Request timed out.

  8     *        *        *     Request timed out.

  9     *        *        *     Request timed out.

 10     *        *        *     Request timed out.

 11     *        *     ^C

C:\WINDOWS\system32\drivers\etc>



C:\WINDOWS\system32\drivers\etc>ping 192.168.30.11



Pinging 192.168.30.11 with 32 bytes of data:



Reply from 192.168.30.11: bytes=32 time<1ms TTL=128

Reply from 192.168.30.11: bytes=32 time<1ms TTL=128

Reply from 192.168.30.11: bytes=32 time<1ms TTL=128

Reply from 192.168.30.11: bytes=32 time<1ms TTL=128



Ping statistics for 192.168.30.11:

    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

    Minimum = 0ms, Maximum = 0ms, Average = 0ms



C:\WINDOWS\system32\drivers\etc>ping 192.168.30.1



Pinging 192.168.30.1 with 32 bytes of data:



Reply from 192.168.30.1: bytes=32 time<1ms TTL=255

Reply from 192.168.30.1: bytes=32 time<1ms TTL=255

Reply from 192.168.30.1: bytes=32 time<1ms TTL=255

Reply from 192.168.30.1: bytes=32 time=1ms TTL=255



Ping statistics for 192.168.30.1:

    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

    Minimum = 0ms, Maximum = 1ms, Average = 0ms

Open in new window

0
 
LVL 77

Expert Comment

by:Rob Williams
Comment Utility
That certainly all looks OK.
Wireshark may tell you something but it takes a while to filter out what you are looking for, especially when you don't know what you are looking for :-)

If I were to stick on the default gateway theory..
Try opening the registry editor and go to:
HKLM\System\CurentControlSet\Services|TCPIP\Parameters\Interfaces\
Look through the settings in each of the subfolders and see if there is a gateway listed that is wrong. Ignore any that are 0.0.0.0 or ASCII. I don't know that i would recommend at this point deleting, but if present that subfolder maybe a remnant of an old/Ghost NIC.

0
 

Author Comment

by:vhcg
Comment Utility

RobWill:  The registry entries check out ok.

All was fine last night, but trouble has started again this morning.  I checked the firewall, did packet captures, checked the DSL modem, etc, etc.

Rebooting the server has temporarily fixed it.  I don't know how long this will last since the server was rebooted at 8am this morning to apply a couple of patches (IE8, Latest 'Malicious Software Removal' tool).  

It sure feels like an intermittent loss of the default gateway, but the system shows otherwise.  I've disabled 'Routing and Remote Access' to remove it from the picture.  No dice there.  Could IPSec be silently 'intervening' at times ?

When the system is not able to get to the outside (or let the outside in) Is there a particular thing I can try while I have Wireshark running?  So far, capturing of pings, traceroutes, etc, just shows that a packet leaves the server and does not get an answer.

Thanks,
VHCG

0
 
LVL 77

Expert Comment

by:Rob Williams
Comment Utility
You mention RRAS. Do you use the VPN feature of RRAS? If so I wonder if it could be using a static address pool that conflicts somehow, or I have seen DNS "confused" by an incorrect VPN configuration while a user is connected.

I don't know what you would look for with wireshark while the connection is lost. Usually you take a capture and then start filtering out the traffic you can confirm is acceptable and look at what is left. But it may not show anything.

Another completely different thought is the possibility of an outside denial of service attack. The most common occurs when you allow/reply to ICMP (ping requests) from the Internet. This is off by default with the PIX, but might that be enabled?
0
 

Accepted Solution

by:
vhcg earned 0 total points
Comment Utility

I think I got it.

I've been checking the configs of all of the network devices and when I looked at the status of the DSL modem (which also has a 4 port switch) built-in, I saw *two* ports active.  Only one (the PIX) should be active.  

The modem has a label on it the IP address, netmask, and default gateway entries.

Someone jacked into one of the ports and setup himself up with a static IP...the SAME IP as the public IP of the SBS Server.

I thought I had this licked numerous times before, but I'm fairly certain that I've got it for sure this time.  

I'll report back in a day.

Thanks to all who responded (especially you, RobWill for hanging in there).

VHCG
0
 
LVL 77

Expert Comment

by:Rob Williams
Comment Utility
>>"Someone jacked into one of the ports and setup himself up with a static IP...the SAME IP as the public IP of the SBS Server."
That would definitely create havoc. It can lock up the modem if not the router.

I have seen this at universities with students trying to use protected networks. They get an IP off an allowed machine and clone it. Better yet they often clone the MAC. Switches love that :-)

Thanks for updating. Hopefully you are on to something.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

The articles for turning off the Client firewall policy on the internet are for SBS 2008 and don't really help for SBS 2011. They actually moved the Client firewall policy. In 2011, the client firewall policy has moved to the SBS computers conta…
A quick step-by-step overview of installing and configuring Carbonite Server Backup.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now