Link to home
Start Free TrialLog in
Avatar of MECIT
MECITFlag for United States of America

asked on

Network tools for Networking Issues

What tools could I use to troubleshoot networking Issues?

We are seeing intermittent network problems between two servers.

One is a virtual and the other is a physical.

they last 20 seconds or more before pinging starts again
Avatar of dmarinenko
dmarinenko
Flag of United States of America image

Wireshark works good.
It is a free packet analyzer http://www.wireshark.org/

Also try doing a "tracert ip-of-other-comp" at a dos prompt.  This will ping all the gateways in between.
Avatar of Andrew Porter
What sort of network hardware do you have between the two servers? You could use the logging feature of any Cisco equipment to identify collisions and such.
Avatar of MECIT

ASKER

i am using (3) Dell Powerconnect 6248 gigabit switchs.

I did the tracert on both computers
They had 1 hop at 1ms to the other server.
What kind of network adapters do you have on the machines?  
I have seen this exact issue with Atheros NIC's.
Have you tried connecting different computers to the virtual to figure out whether it's an issue with the virtual or the other computer?
Have you updated the NIC drivers to the latest versions?
If this is a VMware VM I would look at all your VMs resources and make sure your not ballooning.

Also is it only between this on VM on this Host and this one Physical box or can you run a constant ping to the VM from any device on your next work and have the ping fail or visa verse with the physical box.
It seems odd that you only have issues with just these to specific devices talking to each other. Also if possible you could try moving to different switch ports.
IF it is these two boxes are you pinging by IP or host name?

Has there been any network or hardware changes recently which may have triggered this or are one of these servers new?
Avatar of MECIT

ASKER

on the vritual it is using broadcom NetXtreme II BCM5708 and Intel 82575GB.

on the physical it is using broadcom Netxtreme II BCM5708

Im looking to see if there are new drivers for the physical.
At first check by ping:

From Server A, ping server A and B both at a time
From server B, ping server B and A both at  a time.

What do you see?IS there any PKT loss to it's own interface?
Avatar of MECIT

ASKER

I have updated the firmware on the switches and I am updating the drivers on the VM Hosts.
Once completed I will run the tests again.

On the Powerconnect 6248, is there an option to monitor the ports that are connected to the physical server.

How can i troubleshoot from the switches?
Avatar of MECIT

ASKER

Server A and Server B are pinging at 1ms but after a fairly few 1ms they both increase up 13ms one and 10 ms on the other.
Then they both return back to 1ms.
You have 3 switches between 2 servers.Are those managable and IP assigned?
So, ping every switch step by step to find high latency.

You also need to check traffic usage of each switch.Does this switch support SNMP ?
If this switch supports SNMP then you could configure Cacti/MRTG and check each port utilization.This could give you idea.
Avatar of MECIT

ASKER

They are managed ad stacked with one ip

this is what i get when pinging the switch

1ms
2ms
2ms
1ms
5ms
1ms
1ms
1ms
2ms
3ms
2ms
1ms
1ms
1ms
2ms
6ms
1ms
Latency become higher for several reasons:
High usage, Memory/CPU issue or firewall,routing issue etc.

However, only 6ms shouldn't be problem for network for communicating or data transferring between two servers.
You indicated that you are facing network issue.What exact network problem are you facing? Slow data transfer? PKT loss? or high latency?
Avatar of MECIT

ASKER

The results were from my laptop to the switch.

We are having slow data tranfers and some packet loss
Please check from your server to switch.At first ping test then you need to check throughput.
To check throughput between two servers you may use 'Iperf'
http://openmaniak.com/iperf.php

Download for linux version:
http://sourceforge.net/projects/iperf/files/latest/download
For windows version:
http://www.mayoxide.com/iperf/iperf-2.0.5-cygwin.zip
Avatar of MECIT

ASKER

do i install the application on both servers?

could i install it on my laptop or does it need to be a server.
Iperf is a tool to check throughput between two points.You may install iperf into your both server to check throughput between two servers.Or if you want to check throughput between laptop to server then install a copy into laptop and another copy into server.

But throughput might vary based on computer performance and configuration or even Operating System.So better, try to test between two servers.
Avatar of MECIT

ASKER

Do i just place a copy on the c: of each server?

How do I get it to install?

Does this open its own application or am i going to be running it in the command prompt?
Avatar of MECIT

ASKER

This is the server end results
Interval - 10.2 secs
Transfer - 6.38 MBytes
Bandwidth - 5.23 Mbits/sec

These are the client side results
Interval - 10.2 secs
Transfer - 6.38 MBytes
Bandwidth - 5.27 Mbits/sec

What do the results mean?
Too poor performance.

Keep each copy of iperf into both server.Then go to command prompt into server and go to iperf directory by 'cd' command.
Then run "iperf -s"                 //iperf -s inicates server mode
From another server use
iperf -c "Iperf Server IP"                  //This is client mode.

If you get the same results it means there is some problem in network.
For that case I'd suggest you to check throughput step by step:

1. Server to laptop by using a cable
2.Server to laptop through switch.

Then you could become to understand which is the cause for this less throughput.
Avatar of MECIT

ASKER

Called Dell tech support , about the dropped packets.

They stated it might have to do with the spanning tree settings. But they are doing a best effort since I dont have support only hardware support.

i have no idea what the spanning tree does or doesnt do. Would you think that could be the issue?
Of course Spanning tree could be an issue for throughput.But there are other configurations also could impact for slow transfer.
To know about Spanning tree:
http://www.enterprisenetworkingplanet.com/netsp/article.php/3580966/Networking-101-Understanding-Spanning-Tree.htm
Avatar of MECIT

ASKER

What other configurrations could affect it?

I enabled some ports to port fast. I set the root id to 4096 to allow the switch to become the root under the spanning tree. for some reason , the root id was pointing to a WAP.
This was the best effort advice from Dell.

I am still seeing packets dropped and i also ran th iperf test again and had the same results.

What else can I do?
Leave the switch and connect  your server and laptop using a cable.Then make iperf test between laptop and server.Be sure that server and laptop performance is ok.Then you could go for switch.
Avatar of MECIT

ASKER

The majority of our servers are virtual. The physical servers are our critical servers and at this tie I can not unplg the nic to do the testing.

Can I do from desktop to laptop? Will this work?
Ok..you could check between desktop and latop using cable at first.Then connect both laptop and desktop into same switch and test again.
Avatar of MECIT

ASKER

here are my results

Desktop to Laptop

Interval- 10 sec
Transfer- 68 MBytes
Bandwidth-57 Mbits/sec
 
desktop to switch to laptop

Interval- 10 sec
Transfer- 90.4 MBytes
Bandwidth-75.8 Mbits/sec

Server to laptop

Interval- 10.2 sec
Transfer- 6.25MBytes
Bandwidth-5.16Mbits/sec
Avatar of MECIT

ASKER

for the latop to switch to desktop it is connected a differnt switch
It could be identified if you could check by the same switch that you used for server.Also be sure about your servers NIC speed settings.
Avatar of MECIT

ASKER

I did not understand the first part of your comment.


Both servers - Nic speed is set to auto
Use the same switch that is connected to your server.Connect your laptop and desktop into 2 free ports of that switch and make test between laptop and desktop.Then you can find out if there is any difference.
Avatar of MECIT

ASKER

there was
 the server to laptop results

Server to laptop

Interval- 10.2 sec
Transfer- 6.25MBytes
Bandwidth-5.16Mbits/sec
You didn't get my point.

You already made test between laptop to desktop which looks fine.But you used different switch.
So now use the switch which you used to test between server to laptop.

Please check if there are 2 free ports into that switch.Then connect both laptop and desktop into that switch and test between laptop to desktop.Basically, you are going to test that switch.
Avatar of MECIT

ASKER

Interval- 10 sec
Transfer- 6.12MBytes
Bandwidth-5.14Mbits/sec

here are the results
So, it's the configuration of your switch.

Could you please disable Spanning Tree for a while? Then you could test again if it's the reason for Spanning tree.One by one you can check other configurations too.
Avatar of MECIT

ASKER

Would it affect te network more if i disable the spannng tree?
Will the switch need to reboot?
You can disable Spanning Tree for temporarily. After test you can enable it again.I don't think it requires reboot.However, based on Switch model if it requires reboot it'll alert you to reboot.

Before changing any configuration better you take snapshot of each page/configuration backup.It'll help you to reconfigure again.
Avatar of MECIT

ASKER

I have disabled spanning tree and flow control.

Still getting the same results
So, it's not for the spanning tree but other configuration of your switch.Are you using VLAN ? Check by connecting to other ports.
Avatar of MECIT

ASKER

no VLANs
I connected two laptops to each dell switch
1st test 223Mbits/Sec , laptop 1 server;laptop2 client

2nd test 141Mbits/Sec  , laptop 2 server;laptop1 client
and kept same switch and ports on switch


laptop and physical server
3rd Test 84.3 Mbits/Sec  , Server was the server;laptop was client

4th Test 4.98 Mbits/Sec , Server was client; laptop was server and same ports and same switch
Which switch are you using?? simply
iperf -c "server iP"        //for client
iperf  -s                          //for server

If it is same port and same switch for both cases you should get same throughput.However, it seems there are some port configuration which is the reason for different throughput for different traffic direction.

Finally, to be sure use laptop2 and physical server:
Physical server as server and laptop2 as client
Laptop2 as server and Physical server as client.

If it is the same result it must be your switch.Might be some switchport configuration.
Avatar of MECIT

ASKER

Dell support recommended to break the stack.
Sw 1 prt 1 --> Sw2 prt 1 ; Sw2 prt 2 --> Sw3 prt 1


If I do the iperf test from laptop to laptop I am getting

iperf -c x.x.x.x -d

transfer                    bandwidth
244MB                         205Mb
235MB                         195Mb
 
If I do the iperf test from laptop to server I get
 
iperf -c x.x.x.x -d

transfer                    bandwidth
271MB                         227Mb
6MB                               5.03Mb
iperf - x.x.x.x -d indicates bi-directional throughput.From your test we can decide that server upload throughput is too low.So, there are only two possible reason:

1. Server upload capacity
2. Switchport configuration connected to that server.

If it possible you can connect your server to another tested switchport.If the throughput result shows same then it's the problem of your server.Otherwise, it's your specific that switchport
Avatar of MECIT

ASKER

We did plug it into another switch and port same results.

if its the server , what would cause that because we have tested iperf on 4 or 5 severs and com back with the same results.
Are those physical server?Have the samw NIC model ??
Avatar of MECIT

ASKER

Yes it is a physical server and we purchased a new NIC for the server.

we used a different switch, new cables that were tested.

we get the same results.
How do we get to improve server throughput to increase?
So, it's sure that your server is the reason for this.

However, you need to check server performance first.Based on Server OS there are many server performance tool available.

1. Be sure all server performance is ok.
2. Check memory and CPU usage when you are downloading/iperf test
3. Try by disabling antivirus(If there is any antivirus)
4. Stop other unnecessary applications/program/services then try again.
Avatar of MECIT

ASKER

I and get this on a few servers

xception: STATUS_ACCESS_VIOLATION at eip=6110D923
eax=00000014 ebx=00000000 ecx=FFFFFFFF edx=00000014 esi=00000001 edi=00000014
ebp=1A22C858 esp=1A22C854 program=C:\iperf\iperf.exe, pid 7892, thread unknown (0x2F54)
cs=001B ds=0023 es=0023 fs=003B gs=0000 ss=0023
Stack trace:
Frame     Function  Args
1A22C858  6110D923  (00000014, 1A22CB1A, 00410B8E, 00000001)
1A22CB38  61142926  (1A22D000, 1A22CB58, 00410B88, 1A22CBF8)
1A22CBD8  61118839  (004161F8, 00000020, 00410B88, 9F63D75A)
1A22CC18  610C01A5  (004161F8, 00000020, 30ED8C00, 42C27300)
1A22CC68  00403F9C  (00EC7B20, 00000000, 1A22CCB8, 610E1469)
1A22CC88  004056A0  (00EC7AC0, 00000001, 00000001, 05A1DB2E)
1A22CCC8  004059F5  (00EC7AC0, 00000000, 00000001, 610713B0)
1A22CCF8  00405D19  (00EC7AB8, 00000000, 00000000, 00000000)
1A22CD18  00405DCD  (00EC7AB8, 00412088, 00000000, 00000000)
1A22CD38  00405FBE  (00E31088, 00000000, 00000000, 00E512C8)
1A22CD58  0040980D  (00E31088, 00000000, 00000000, 00000000)
1A22CD98  610E38C5  (00E512C8, 1A22CDD4, 610E3810, 00E512C8)
End of stack trace
Avatar of MECIT

ASKER

I am also getting some of these

connect failed: Connection timed out
Might be corrupted file or antivirus/firewall is blocking
Avatar of MECIT

ASKER

We tried all 4 steps
We are getting the same results.

I am going through all our servers, virtual and physical, running the iperf test. So far all have the same results.

should the results be the same as laptop to latop. around 100 to 200 Mbps
Could you please check server to laptop directly (Without using switch) by a cross cable?
Avatar of MECIT

ASKER

Same results.
Which OS do you use for your servers?
Avatar of MECIT

ASKER

Windows 2008 and 2003
Download Microsoft Baseline Security Analyzer and check that which updates are missing:
http://www.microsoft.com/download/en/details.aspx?id=7558

Also Check windows Server 2003 performance Advisor:
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=15506

For Windows 2008:
http://msdn.microsoft.com/en-us/library/windows/hardware/gg463394.aspx

Also check Event log for any kinds of error.
Also post your server network card settings.Go to Device Manager>NIC Properties>Advanced.Then check all the options one by one and if possible post here.
Avatar of MECIT

ASKER

Intel Pro/1000 MT

Gigabit Master Slave Mode  - Auto Detect
Jumbo Frames - Disabled
Locally Administred Address - empty
Log Link State Event - Enabled
Performance Options -
Adaptive Inter-frame spacing -  Enabled
Flow Control - Generate & Respond
Interrupt Moderation Rate - Adaptive
Receive Descriptors - 256
Transmit Descriptors - 256

Qos Packet Tagging - Disabled
TCP/IP Offloading options - Everything is checked off
Wait for Link - Auto Detect
Change receive and Transmit Descriptors and test with following values and test with iperf

1. RxD:128  TxD: 128
2. RxD:256  TxD: 128
3. RxD:256  TxD:  64
4. RxD:256  TxD:  32
5. RxD:256  TxD:  16
6. RxD:128  TxD:  64
7. RxD:64    TxD:  64
8. RxD:16    TxD:256
9. RxD:16    TxD:128

Collect the iperf result and see if any differences.

NB: each time you change enable/disable NIC
Avatar of MECIT

ASKER

i tried a couple but same results. This are from the exchange server so it is hard to test out.

I looked at the other servers and they do not have all the options the same on the nic.
So, it might be some other options.But it's not the switch but your server is causing problem.
I don't know if you have updated your server or not.Using Baseline Security Analyzer you could check missing updates as I suggested before.I don't have no more idea.Might be some other experts could look on this issue.
ASKER CERTIFIED SOLUTION
Avatar of MECIT
MECIT
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
But you checked throughput between laptop and server directly using a cross cable.So, still there's doubt by changing switch the problem would resolve or not.
Avatar of MECIT

ASKER

I know i will keep reasearching but for now the switch needs to be replaced anyways.
Avatar of MECIT

ASKER

Thanks for all the help. They are replacing the switch because they believe it has other hardware symptoms that are failing.