Solved

How do I debug (and fix) an intermittent communications problem?

Posted on 2009-07-09
14
419 Views
Last Modified: 2012-05-07
 I have a server running windows server 2003 R2.  This server is also running Exchange 2007 SP1.  The server runs fine for about 16 hours then looses some of it's communications abilities.  

  It can still communicate with other servers on our network.  I can ping it's default gateway and our VPN / firewall.    I can run a tracert to google.com and it works.

  Trying to use a web browser from this system fails as does any attempt to send email out.  

  When the system is in this state, simply rebooting it fixes the problem until it happens again.

  I am not a guru at getting under the hood in Windows and diagnosing this kind of thing by looking at log entries.  I believe that the issue is with Windows itself since other forms of communication are affected.

  What I would like is a set of tests I can run in order to determine what is causing the blockage, then what I need to do to avoid it happening in the future.
0
Comment
Question by:developmentguru
  • 7
  • 7
14 Comments
 
LVL 4

Expert Comment

by:themightydude
ID: 24815133
So if you use a web browser / send emails, those fail..but you can still ping google.com..ping equipment on your network?

Is there anything at all in the event log?..either in application or system?
0
 
LVL 21

Author Comment

by:developmentguru
ID: 24815487
I cannot ping google.com, it times out.  I can tracert it.

The application event log had the following error

Microsoft Exchange couldn't find a certificate that contains the domain name oa.polydeckscreen.com in the personal store on the local computer. Therefore, it is unable to support the STARTTLS SMTP verb for the connector Polydeck using polydecksceen.com with a FQDN parameter of oa.polydeckscreen.com. If the connector's FQDN is not specified, the computer's FQDN is used. Verify the connector configuration and the installed certificates to make sure that there is a certificate with a domain name for that FQDN. If this certificate exists, run Enable-ExchangeCertificate -Services SMTP to make sure that the Microsoft Exchange Transport service has access to the certificate key.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

As far as the system the only warnings or errors are related to printers.
0
 
LVL 4

Expert Comment

by:themightydude
ID: 24816134
hmm..so you can tracert google.com, but ping google.com times out..does it resolve to a IP?

Network setup is:

Internet --> Firewall / VPN --> Switch --> Servers / computers?

What do you use for DNS servers?
0
 
LVL 21

Author Comment

by:developmentguru
ID: 24816731
TraceRt does resolve to an IP address.

You are correct on the network setup.

We have two internal DNS servers both windows 2003 server R2.

We have had the error I posted since I posted it and there was no associated shut down of communications.
0
 
LVL 4

Expert Comment

by:themightydude
ID: 24816750
How long has this been happening with not being able to use a web broswer to get out?

Any recent changes / upgrades?

New DNS server entries etc etc?

Does ping resolve to an IP address?
0
 
LVL 21

Author Comment

by:developmentguru
ID: 24817435
--How long has this been happening with not being able to use a web broswer to get out?--

We just found out about it in the last couple of days.  We have had instances where the server has acted this way of the last month or so, just not this frequently.

--Any recent changes / upgrades?--
  We did make a change to the server to activate a second NIC to tie it to our SAN.  We then moved some of our files from the server hard drives to the SAN.

--Does ping resolve to an IP address?--
Ping, from the server while it is in this state, times out.

We did a little digging and found out that our security software (Panda Security) had somehow been tied to the SAN's IP address.  I could see the constant activity being viewed as an attack and the security software shutting down communications.  We have since removed it from that IP address and the server has not shut down yet.  If it is still running, continuously, this time tomorrow then it is likely solved.

One thing you can still do to earn the points is to give me some tests to run (other than what I have mentioned).  Tests that would allow me to see if SMTP can get out, or any other protocols you can think of.  Tests that will show error results would be best.
0
 
LVL 4

Accepted Solution

by:
themightydude earned 500 total points
ID: 24817624
Ahh..sounds like that might of caused some problems..is it enabled on the other NIC as well?

A simple exchange SMTP test you can try is to login to the server then telnet to a mail server from that server.

for example:  telnet mail.server.com 25

You should get some sort of welcome banner from the mail server you telnet into.

To make sure SMTP is working fine on your server...telnet to your mail server from any machine inside or outside of your network using the same method.

You should get some header with "Microsoft ESMTP MAIL service, Verxion xxxx ready at : xxxx"

all of that should be followed by 250 - xxx

If you get that back, then you know the SMTP service is working alright.

Have you tried turning on diagnostic logging for SMTP in your exchange server?

Also, you might try downloading the microsoft exchange troubleshooter...it can help to point out any potential or current problems as well.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 21

Author Comment

by:developmentguru
ID: 24818197
One NIC goes to our network.  The other NIC goes to the switches (fabric) that only goes to our SAN.  The only one with the security software running now is the one that goes to our network (as it should).

can you give me an example of an external site to try the SMTP telnet with?  This would only be used to verify that the communication is being passed.

Do you have a simple test you use to check FTP?  Sorry if I sound like a newb on all of this, for some things I am.
0
 
LVL 4

Assisted Solution

by:themightydude
themightydude earned 500 total points
ID: 24818330
your wanting to see if FTP is running on the server or your wanting to see if you can get out via FTP from the server to another one?

If your just checking to see if you can get out..just open this up from your server.
ftp://ftp.netscape.com/

If you can get to that then you can get out via FTP from your server.

In regards to a external site to try SMTP with:

telnet mx1.hotmail.com 25

then type in  ehlo test.com

you should get something similar to:

220 col0-mc1-f25.Col0.hotmail.com Sending unsolicited commercial or bulk e-mail
to Microsoft's computer network is prohibited. Other restrictions are found at h
ttp://privacy.msn.com/Anti-spam/. Violations will result in use of equipment loc
ated in California and other states. Thu, 9 Jul 2009 14:03:36 -0700
ehlo test.com
250-col0-mc1-f25.Col0.hotmail.com (3.8.0.31) Hello [74.52.164.130]
250-SIZE 29696000
250-PIPELINING
250-8bitmime
250-BINARYMIME
250-CHUNKING
250-AUTH LOGIN
250-AUTH=LOGIN
250 OK
0
 
LVL 21

Author Comment

by:developmentguru
ID: 24823936
Thanks for some of the testing tools, here is the latest.  The last time the server got into this state I tried some of the tests.  Web requests would not go out on the server but worked well from any other system we tested.  I could ping internal addresses but not external (from the server in question).  Pinging external (or any of the other tests worked fine from a windows XP system on the network.  I was able to send myself an email from hotmail and receive it in house, but SMTP from the server would not function going out.    Tracert timed out.  FTP tests worked from other systems, not the server.  I could do the telnet SMTP test to our exchange server in house and it worked.  Hopefully this info gives you a place to start...
0
 
LVL 21

Author Comment

by:developmentguru
ID: 24823973
I was also able to use Outlook Anywhere web access to get into emails.  It would allow me to send internally and queue anything I tried to send external.
0
 
LVL 4

Expert Comment

by:themightydude
ID: 24824050
hmmmm...this is very strange.

It's going to be something specific with that server then since I assume all your other workstations and what not use the same firewall and DNS servers as the server.

To sum up this problem..anything inside of your network is fine..you can talk to anything on your network from that server..but if you try to talk to a computer outside of your network from that server, you get nothing.

When it does this again...disable the security software on the network facing NIC..just for a few minutes.

I assume this server has a static ip correct?

You might also do a  route print from the server before the probelm happens, and then again when the problem is occuring.

Also, if none of the above helps when this happens, try disabling then re-enabling the NIC instead of rebooting the server. For the hell of it, you might try resetting the TCP/IP stack...

 netsh int ip reset c:\resetlog.txt

is there anything in the firewall logs about blocking outbound traffic from that server ip?
0
 
LVL 21

Author Comment

by:developmentguru
ID: 24826108
Thanks for all of your advice, I will add this to my knowledgebase as you have given me some new tricks to try.  I had someone from Panda Security remote in and look around.  What we found is this:  I was right to suspect the other NIC but wrong as to why.  The second NIC (that runs directly to the SAN switches) had the default gateway set up. For whatever mysterious MS reason this worked well for several weeks.  Just recently MS decided to try rerouting the network traffic through the SAN!  We removed the default gateway from Local Connection 2 and all went back to normal.  I will flag the posts you put on here that I found useful as the solution (it has worth to me in future similar situations).  I wrote this to be sure the fix was included for anyone trying to find it in the future.

Do not put a default gateway on a NIC unless you want traffic rerouted through it!  This is, I am sure, obvious to everyone who has been in networking any period of time.  It is not obvious to a programmer like myself.
0
 
LVL 4

Expert Comment

by:themightydude
ID: 24826153
Glad you got figured it out.

That actually is new information to me as well...I would have assumed different default gateways on 2 or more nics would not have affected anything. Especially since one is on one network, and the other on a different network.

0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

ADCs have gained traction within the last decade, largely due to increased demand for legacy load balancing appliances to handle more advanced application delivery requirements and improve application performance.
Restoring deleted objects in Active Directory has been a standard feature in Active Directory for many years, yet some admins may not know what is available.
This video discusses moving either the default database or any database to a new volume.
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now