[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Lan connectivity problems

Posted on 2007-10-04
30
Medium Priority
?
1,094 Views
Last Modified: 2013-01-27
Hello, we have been experiencing several lan connectivity problems in my company for about the last two months. More specifically, most users randomly lose their connection to several servers throughout the day. Also please note that the connectivity problems occur only with one server at a time. We have Windows 2003 Active Directory with DNS and WINS server, several Windows 2000 and 2003 servers (these servers are domain members) and Windows 2000, Vista and XP clients. As a firewall, we are using Endian Firewall. We have already checked that it is not a network hardware problem(switches,cables). The event logs of the servers haven't any suspicious entries. We also have checked each machine for viruses. Please help!!!
0
Comment
Question by:vassot
  • 13
  • 10
  • 5
  • +2
30 Comments
 
LVL 4

Expert Comment

by:msguru
ID: 20013275
When a user has connectivity issues to a particular server, can that PC ping that server's IP address, and also can that PC ping the default gateway?  Are the users on the same network as the servers, or is there a layer-3 device betwwen them?  What make/model networkinbg devices are they, and what network cards do the PC's have?  Is there a similarity between the users or the PCs that have the problem (e.g. same model PC, or they are on the same switch/hub)?
0
 
LVL 70

Expert Comment

by:KCTS
ID: 20013309
It would be useful to see the results of an ipconfig /all on a machine that is experiencing an issue.
0
 
LVL 63

Expert Comment

by:SysExpert
ID: 20014170
In addition, did you try to acces by IP rather than via WINS.

If all machines are in one location, I would eliminate WINS totally, as it is not needed in a Native win 2k or newer environment.

ALso check your DNS and AD via the Server resource kit tools.

I hope this helps !
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 26

Expert Comment

by:Fred Marshall
ID: 20014551
I would reexamine the assumptions even if you think you have conclusive evidence.
- only one server is lost at a time is very suspicious.   What if this hypothesis were changed?  Would your diagnosis of the situation be modified to any benefit?  I'll bet it will point more to hardware.
- that hardware is not the problem is very suspicious.  Bad cables are often hard to pin down.  Bad switches or switches in need of a reboot can be really hard to pin down - often only "proven" when a reboot solves the problem!

So, reboot the switches and routers for sure.  Routers and switches in need of a reboot can do very strange things - including route to some addresses and not to others.  Note that switches have a computer inside just as well as routers do.  Consider the effect an intermittent cable might have on the rules that a switch develops....

Check the firmware versions on the routers to see if upgrades are available.  If so, install them unless there is good reason to the contrary.  You will know better than I.

Examine the physical quality of the cabling.  Except for local patch cables (which are throw-away items), are there any "fixed" cables that aren't terminated with a punch-down block?  If so, they are suspect.   Plug-terminations are really only acceptable for short, throw-away, patch cords at the computers and printers and routers, etc.
 
Next, assuming that such termination situations do exist (because they very often do in small offices) what is the workmanship of the non punch-down cable ends?  Is the insulation crimped into the connector providing strain relief?  If not, replace the plugs with proper workmanship - or better yet, with punch-down terminations.  [I have reworked entire facilities with problems like this - and they *did* have intermittencies that were unexplained!]
Replace any implicated patch cables or ones that are showing signs of wear / abuse.

I understand that this is counter to the "given information" but it's just too common a root cause not to mention it.
0
 

Author Comment

by:vassot
ID: 20019835
KCTS, this is what ipconfig/all outputs:

Windows IP Configuration



      Host Name . . . . . . . . . . . . : Tsiartsioni
      Primary DNS Suffix  . . . . . . . : anko.gr
      Node Type . . . . . . . . . . . . : Hybrid

      IP Routing Enabled. . . . . . . . : No

      WINS Proxy Enabled. . . . . . . . : No

      DNS Suffix Search List. . . . . . : anko.gr

Ethernet adapter Local Area Connection :



      Connection-specific DNS Suffix  . :
      Description . . . . . . . . . . . : NVIDIA nForce Networking Controller
      Physical Address. . . . . . . . . : 00-13-D3-12-26-A5

      DHCP Enabled. . . . . . . . . . . : No

      IP Address. . . . . . . . . . . . : 192.168.0.177

      Subnet Mask . . . . . . . . . . . : 255.255.255.0

      Default Gateway . . . . . . . . . : 192.168.0.127

      DNS Servers . . . . . . . . . . . : 192.168.0.192
      Primary WINS Server . . . . . . . : 192.168.0.192


MSGuru, I still don't have all the answers to your questions, so I'll be back to give you all the answers in a while.
0
 

Author Comment

by:vassot
ID: 20019911
For Msguru, The pc that has this temporary disconnection, can't ping the server but can ping the default gateway.
What do you mean by a a layer-3 device between the users of the network and the servers?
The networking devices are Hp Procurve 2824, Hp Procurve 1524 and Intel 510T Express, Intel 460T switches. We also have small switches (brand Level1, Compex) of 5 or 8 ports throughout the network. The PCs have various brands and models of network cards either 100 or 1000 Mbps. None of these cards work in auto-save energy mode. There aren't any similarities between the users or the pcs. Almost every pc loses connection to one or more servers regardless of their network location and the switch. The pc models are various.
0
 

Author Comment

by:vassot
ID: 20019984
For SysExpert, WINS was installed lately (3 days ago). The problem pre-existed (for almost two months) before WINS. Anyway, we will remove it from the machines again. As far as DNS and AD testing is concerned, we have used netdiag and dcdiag and all tests were passed. Can you suggest a tool for DNS Testing? We have already checked the DNS through the DNS console.
0
 

Author Comment

by:vassot
ID: 20020276
This is an error in the Application event log of a server:

Replication of license information failed because the License Logging Service on server \\ARTEMIS could not be contacted.

Computer ATHENA
Source LicenseService
Category None
Event ID 213

Please note that ARTEMIS is the Domain Controller and that this error appears in the Application event log of another server.

These servers have Windows 2000 operating system.
ARTEMIS, the domain Controller, has Windows 2003 operating System.
0
 
LVL 4

Expert Comment

by:msguru
ID: 20020783
Hi vassot,

The fact that you could ping the default gateway, but not the server is a big clue!

To answer your question about 'layer-3' devices - these could be a router of a layer-3 switch (which is effectively routing done by an enhanced, 'layer-3' switch).

Now can you cover some presumptions for me:-

P1) If all your workstations and servers are on the same network, and not going through a 'layer-3' device - then we can rule this out.  It looks like they are all on a 192.168.0.x network - just to be sure can you confirm that?

P2) Also, can you confirm the presumption that your workstations don't go through the endian firewall to get to the servers?

P3) I presume the endian firewall is the default gateway 192.168.0.127, is that right?

Now, here's a few things to test/try - this will narrow down the area to look at dramatically:-

T1)
a) When the problem happens on the suspect computer (it loses connectivity to a server), ping from each of the servers to the default gateway, if that works OK ping the other servers from each server.
b) When the problem happens on the suspect computer, again - do the ping to the server that connectivity dropped to, and also a ping to the default gateway.  At the same time, go to a computer that has NO reported problem, and do the sames pings.  This may show that the computers that had no reported problem, actually had connectivity issues as well (maybe just no comms were being done from those computers at that very specific time).
Depending on the length of the outages (how long does the problem server not respond to a ping?), you may have time to ping other things as well... if you do have the time, ping the other servers, and an external IP like ping www.novell.com as well.  That would provide valuable information.

T2)
a) First, look at the path of cables and switches that the problem workstation would have taken to go through to reach the default gateway (I presume this is 192.168.0.127, as you mention in the IPCONFIG above).
b) Now look at the path of cables and switches that the problem workstation would have taken to reach the server (or servers) that didn't respond.
c) Finally - what cables and switches are unique to b), that were not have been taken in a) ?

Please try these and let us know!

Best of luck!
0
 
LVL 4

Expert Comment

by:msguru
ID: 20020801
Hi vassot,

I think you can eliminate the error in replicating licenses - it would not be causing network connectivity issues, however - it may be as a *result* of network connecivity issues!

I think you should troubleshoot that error after resolving the network connectivity issues.

Cheers!
0
 

Author Comment

by:vassot
ID: 20040106
Dear msguru, sorry for the delay but things are getting worse each day.

First of all I would like to tell you that if I use the command ping -t from a workstation, I see that the connection to the gateway is lost instantly quite often. Also, when we use pathping on the gateway, we see a great deal of loss of packets.

P1) We are not using a 'layer-3' device, just the devices we have mentioned earlier plus a Zyxel P-660H-D1 router. All workstations are on the 192.168.0.x network except two servers that are on a 10.0.0.x network(DMZ).
P2) When we use tracert from a workstation to get to a server, we see that the gateway is not is not in the routing path (there is a direct connection to the server).
p3)Yes, the endian firewall is the default gateway
T1)
a) When a problem happens, the server to which the pc can't connect, can connect successfully to the gateway as well as the other servers.
b) If we exclude the computer with the problem, the other computers that we check randomly are pinging the server. Also, the computer with the problem can ping the gateway.
T2)
a, b, c) the physical path to the server and the firewall is exactly the same (all of the servers and the gateway are directly connected to the same switch, where all the other switches (to which the workstations are connected) are connected, too. Please note that the cable of the firewall was replaced to exclude such a problem.
0
 

Author Comment

by:vassot
ID: 20040130
We also suspect that we may have Windows licensing or other security issues (Kerberos, Ldap etc). Please give us any ideas.
0
 
LVL 26

Expert Comment

by:Fred Marshall
ID: 20042590
You say:
"First of all I would like to tell you that if I use the command ping -t from a workstation, I see that the connection to the gateway is lost instantly quite often. Also, when we use pathping on the gateway, we see a great deal of loss of packets."

So, I'm back to my initial observations pointing to hardware - only now more strongly stated...!  I refer to "lost instantly quite often".

I would investigate cables / cable terminations first.
An intermittency in a switch or router could cause this as well.

You might try using Ethereal (a free download) on a laptop to observe the traffic here and there if you've not already done that.  Perhaps insert a hub (not a switch) at a likely spot to oberve the traffic reliably.

Unless you're able to pinpoint a problem rather exactly, it is often impossible to reject the notion of hardware failures/intermittencies.  They can act weird and you can use up a lot of time assuming that they don't exist.

It looks like you're on the right track.  I don't fully understand the topology of your network but you should definitely have it drawn out so that you know where all the physical paths really are.  Then when there's a path failure you can see it on the diagram.  Then when there's another path failure you can see that one as well.  Look for a common physical path element - be it wire or a box.

This doesn't feel like software to me!
0
 

Author Comment

by:vassot
ID: 20042658
Dear fmarshall, thank you for your observations. The cable of the firewall to the switch was replaced. All other servers that do not have any loss of packets are connected to the same switch. Do you suggest a problem in the network card of the firewall because there is nothing else left to check. And could this loss of packets cause trouble to the connection of the workstations with the other servers?
0
 
LVL 26

Expert Comment

by:Fred Marshall
ID: 20042869
OK- well to respond in any reasonable way I'm going to ask you for the *physical* network topology.

Each client is wired to .... what?
Each hub, switch, ... is wired to what? (in addition to clients)
Where *is* the "gateway" in the endian or separate?
Each server is wired to .... what?
The AD/WINS/DNS servier is wired to .... what?  Is this .192 ??

Then, if we go back through your messages, which paths on this diagram have failed and/or which computers have failures/problems?  Which ones don't?

I can't envision how things are contributing otherwise...  
For example, when you say that you replaced the cable of the firewall to "the switch", I don't know what that means to the problems described without knowing where the switch is in the topology.  I can *guess* but that's dangerous for anyone to do..
0
 
LVL 4

Expert Comment

by:msguru
ID: 20042898
Hi Vassot,

I think it's best to resolve your network issues first, then move onto the other issues (Windows licensing or other security issues - Kerberos, Ldap etc) IF THEY STILL EXIST!  Dropped packets will cause these other issues, but these issues should not cause dropped packets.

Based on your feedback, I think it's time to look at your managed switch ports for errors, and also look at ethernet auto-negotiation issues...

First, you have managed switches (at least the HP procurve 2824 is - maybe you can fill me in on the others).  You should be able to log into the management page of the switch (or, less easy - use a console connection) and look at the error statistics on the ports.  First thing is 'zero' all the stats, then look at which ports the errors are happening on.

Now focus on on these 'high error' ports (note that there will ALWAYS be some errors that happen on an occasional basis - but there shouldn't be errors clocking up every second!).

Look at the speed/duplex on that port - is it fixed or is it auto-negotiate?  Auto-negotiate can be a big trouble-maker, and mis-matches of either negotiated or manually set duplex are so common.  Both sides should either be on auto, or they should be on the exact same fixed speed AND duplex (half or full).

When speed/duplex is sorted out, on a particular 'high error' port, check to see if it's still clocking up a lot of errors - if so, then check the cable and the 'health' of the network device on the other end (whether it be another switch, an interface on a router, or another switch) - consider changing that network device (even if temporarily, just as a test).

You do have the option to 'go all guns blazing' and just fix EVERYTHING to the same speed and full duplex (where you can).

At a minimum, I'd fix all the servers speed/duplex on each server and on the switch port they go into - making sure you're using the most appropriate switch as the 'core' switch.  At a glance, I would choose that HP Procurve 2824.  By the way, this switch supports 'layer-3 routing' according to the HP spec sheet.

Best of luck!
0
 

Author Comment

by:vassot
ID: 20043009
fmarshall, msguru thank you very much. I'll post you the test results tomorrow.
0
 

Author Comment

by:vassot
ID: 20047513
Fmarshall, each client is wired to a switch and each of these switches is connected to the main switch (which is the HP Procurve 2824). The gateway (Endian Firewall) is a machine connected directly to the main switch. Also, each server is wired directly to the main switch. The AD/WINS/DNS server is also wired directly to the main switch. Yes, the AD/WINS/DNS server is a 192.168.0.x server. The only path that has dropped packets all the time is between any workstation or server and the firewall. (When we use pathping we always have a packet loss). To the contrary, pathping has never loss of packets to any other path, except when a workstation/server totally loses connection with a server (ping is impossible anyway).
0
 
LVL 4

Expert Comment

by:msguru
ID: 20048612
Vassot,

If you can get onto the console of the 2824 switch and do pings to the Endian firewall - OR - get onto the Edian firewall and ping the 2824.  See if there's dropped packets there - if so, look at fixing the speed and duplex on both ends & then re-test.  If still a problem, talk to Endian support - it could be an interface problem.

However, the Endian firewall shouldn't effect the traffic between the workstation and the server - if the server is on the 2824.

You will gain a lot of insight by having the web or console access to the 2824 and looking at the port statistics - and also on the other switches (if they are managed).

Also, I'll re-iterate the recommendation above, to fix all the servers speed/duplex on each server and on the 2824 switch port they go into.  It also would be good to make sure the network card drivers on the servers are up to date with the latest stable release from the network card vendor's web site.

A general question/suggestion as well;  are there some workstations that are more troublesome than others?  You could go onto those workstations and update the network card drivers, and then connect the directly into youre 'core' switch - the 2824.  Then look at the port statistics on the 2824 where the troublesome workstations are plugged into.

You have a powerful 2824 switch with its own ability to show you where errors are happening!

Let me know how my previous suggestion, and this one are going!

Best of luck!
0
 
LVL 26

Expert Comment

by:Fred Marshall
ID: 20051878
OK - so the common path in the failure is between the firewall and the main switch.  That suggests hardware components:
- port on the firewall
- the cable between the firewall and the main switch (which you replaced)
- port on the main switch.

I would change the physical port on the main switch that goes to the firewall.  See if that changes things for the better.  I've definitely seen individual ports go out or become flaky.  Then, do the same on the firewall if you can.

I should check this:  in your original post you said that the connections to the servers were failing.  In your last post you say that packet losses are between the main switch and the firewall.  So, I addressed the latter.  But, does the former still apply?
0
 
LVL 4

Expert Comment

by:msguru
ID: 20052152
Hang on:-  there's still these problems...

"The pc that has this temporary disconnection, can't ping the server but can ping the default gateway."
"Almost every pc loses connection to one or more servers regardless of their network location and the switch"

So, you can't just concentrate on the Endian to 2824 switch connection.

If anything, the common component is the 2824 switch.  Consider swapping the whole thing out!

BUT, please do go through everyone's recommendations above!!!
0
 
LVL 26

Assisted Solution

by:Fred Marshall
Fred Marshall earned 200 total points
ID: 20054062
That's why I asked about the *current* validity of the earlier assertions.
0
 

Author Comment

by:vassot
ID: 20063803
The firewall has loss of packets even when it is connected through another switch to a single server. Also, please note that the machine of the firewall was replaced, the firewall was re-installed and all network cards were replaced. The main switch was replaced by another one in a test but there were no changes (we had dropped connections again).
0
 
LVL 4

Expert Comment

by:msguru
ID: 20065706
Fix the speed (to the highest supported speed) and duplex (to full-duplex) on both ends (switch port & the firewall network card), then see if there is still packet loss.
0
 

Author Comment

by:vassot
ID: 20077026
We have fixed the speed to 100 full duplex on the switch port but we can't set this manually on the firewall, which auto-detects the settings of the port. However, when we set full-duplex, the firewall sees it as half-duplex and a large amount of errors occurred on the switch port.
0
 
LVL 4

Expert Comment

by:msguru
ID: 20077068
Vassot,
As described earlier, "Both sides should either be on auto, or they should be on the exact same fixed speed AND duplex (half or full)".  If Endian runs on Linux, then you need to know the Linux commands (or a way to do it through the GUI) to set the speed and duplex of the network card.  I can't help you on the Linux side.  But surely it must be configurable.
You could break the above quoted rule for a short test, and set the port the Endian is plugged into - to 100 megabits and half-duplex, then check the error count.  If the error count goes up, try re-booting the Endian firewall & check the error stats again.
0
 

Author Comment

by:vassot
ID: 20077101
Msguru,  thanks for your advice but we have finally decided to replace the firewall with another one, so we won't do any more tests. But we wonder if this problem could be responsible for the whole problem of our network. What do you think?
0
 
LVL 4

Expert Comment

by:msguru
ID: 20077136
Hi Vassot,

Something of interest...

http://kb.endian.com/entry/29/

Q: "Why do i have packet loss with some devices if I ping Endian Firewall?"
A: "Endian Firewall has a DoS attack protection which limits ICMP packets to 1 packet per second if more than 5 packets come in too fast."

So, the packet loss with *multiple* PINGs to the Endian would be normal!  Just ONE device pinging the Endian should be OK (but you have to be sure nothing else is PINGing it).

If there were no errors clocking up on the switch port that the Endian is connected to BEFORE you changed the speed/duplex - then, leave the settings as they were.
0
 
LVL 4

Accepted Solution

by:
msguru earned 1800 total points
ID: 20077162
To answer your question: "But we wonder if this problem could be responsible for the whole problem of our network. What do you think?".
How about running your system for a period of time without the firewall connected at all.  I realise that would probably mean a loss of internet and/or DMZ access for a while... but how about trying it after hours - especially from a "troublesome" computer that was losing connectivity.  This would surely answer you question once & for all!
0
 

Author Comment

by:vassot
ID: 20122216
It was the firewall after all!
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Microsoft Office 365 is a subscriptions based service which includes services like Exchange Online and Skype for business Online. These services integrate with Microsoft's online version of Active Directory called Azure Active Directory.
Active Directory can easily get cluttered with unused service, user and computer accounts. In this article, I will show you the way I like to implement ADCleanup..
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.
This video shows how to use Hyena, from SystemTools Software, to update 100 user accounts from an external text file. View in 1080p for best video quality.

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question