MECIT
asked on
Network tools for Networking Issues
What tools could I use to troubleshoot networking Issues?
We are seeing intermittent network problems between two servers.
One is a virtual and the other is a physical.
they last 20 seconds or more before pinging starts again
We are seeing intermittent network problems between two servers.
One is a virtual and the other is a physical.
they last 20 seconds or more before pinging starts again
What sort of network hardware do you have between the two servers? You could use the logging feature of any Cisco equipment to identify collisions and such.
ASKER
i am using (3) Dell Powerconnect 6248 gigabit switchs.
I did the tracert on both computers
They had 1 hop at 1ms to the other server.
I did the tracert on both computers
They had 1 hop at 1ms to the other server.
What kind of network adapters do you have on the machines?
I have seen this exact issue with Atheros NIC's.
Have you tried connecting different computers to the virtual to figure out whether it's an issue with the virtual or the other computer?
Have you updated the NIC drivers to the latest versions?
I have seen this exact issue with Atheros NIC's.
Have you tried connecting different computers to the virtual to figure out whether it's an issue with the virtual or the other computer?
Have you updated the NIC drivers to the latest versions?
If this is a VMware VM I would look at all your VMs resources and make sure your not ballooning.
Also is it only between this on VM on this Host and this one Physical box or can you run a constant ping to the VM from any device on your next work and have the ping fail or visa verse with the physical box.
It seems odd that you only have issues with just these to specific devices talking to each other. Also if possible you could try moving to different switch ports.
IF it is these two boxes are you pinging by IP or host name?
Has there been any network or hardware changes recently which may have triggered this or are one of these servers new?
Also is it only between this on VM on this Host and this one Physical box or can you run a constant ping to the VM from any device on your next work and have the ping fail or visa verse with the physical box.
It seems odd that you only have issues with just these to specific devices talking to each other. Also if possible you could try moving to different switch ports.
IF it is these two boxes are you pinging by IP or host name?
Has there been any network or hardware changes recently which may have triggered this or are one of these servers new?
ASKER
on the vritual it is using broadcom NetXtreme II BCM5708 and Intel 82575GB.
on the physical it is using broadcom Netxtreme II BCM5708
Im looking to see if there are new drivers for the physical.
on the physical it is using broadcom Netxtreme II BCM5708
Im looking to see if there are new drivers for the physical.
At first check by ping:
From Server A, ping server A and B both at a time
From server B, ping server B and A both at a time.
What do you see?IS there any PKT loss to it's own interface?
From Server A, ping server A and B both at a time
From server B, ping server B and A both at a time.
What do you see?IS there any PKT loss to it's own interface?
ASKER
I have updated the firmware on the switches and I am updating the drivers on the VM Hosts.
Once completed I will run the tests again.
On the Powerconnect 6248, is there an option to monitor the ports that are connected to the physical server.
How can i troubleshoot from the switches?
Once completed I will run the tests again.
On the Powerconnect 6248, is there an option to monitor the ports that are connected to the physical server.
How can i troubleshoot from the switches?
ASKER
Server A and Server B are pinging at 1ms but after a fairly few 1ms they both increase up 13ms one and 10 ms on the other.
Then they both return back to 1ms.
Then they both return back to 1ms.
You have 3 switches between 2 servers.Are those managable and IP assigned?
So, ping every switch step by step to find high latency.
You also need to check traffic usage of each switch.Does this switch support SNMP ?
If this switch supports SNMP then you could configure Cacti/MRTG and check each port utilization.This could give you idea.
So, ping every switch step by step to find high latency.
You also need to check traffic usage of each switch.Does this switch support SNMP ?
If this switch supports SNMP then you could configure Cacti/MRTG and check each port utilization.This could give you idea.
ASKER
They are managed ad stacked with one ip
this is what i get when pinging the switch
1ms
2ms
2ms
1ms
5ms
1ms
1ms
1ms
2ms
3ms
2ms
1ms
1ms
1ms
2ms
6ms
1ms
this is what i get when pinging the switch
1ms
2ms
2ms
1ms
5ms
1ms
1ms
1ms
2ms
3ms
2ms
1ms
1ms
1ms
2ms
6ms
1ms
Latency become higher for several reasons:
High usage, Memory/CPU issue or firewall,routing issue etc.
However, only 6ms shouldn't be problem for network for communicating or data transferring between two servers.
You indicated that you are facing network issue.What exact network problem are you facing? Slow data transfer? PKT loss? or high latency?
High usage, Memory/CPU issue or firewall,routing issue etc.
However, only 6ms shouldn't be problem for network for communicating or data transferring between two servers.
You indicated that you are facing network issue.What exact network problem are you facing? Slow data transfer? PKT loss? or high latency?
ASKER
The results were from my laptop to the switch.
We are having slow data tranfers and some packet loss
We are having slow data tranfers and some packet loss
Please check from your server to switch.At first ping test then you need to check throughput.
To check throughput between two servers you may use 'Iperf'
http://openmaniak.com/iperf.php
Download for linux version:
http://sourceforge.net/projects/iperf/files/latest/download
For windows version:
http://www.mayoxide.com/iperf/iperf-2.0.5-cygwin.zip
To check throughput between two servers you may use 'Iperf'
http://openmaniak.com/iperf.php
Download for linux version:
http://sourceforge.net/projects/iperf/files/latest/download
For windows version:
http://www.mayoxide.com/iperf/iperf-2.0.5-cygwin.zip
ASKER
do i install the application on both servers?
could i install it on my laptop or does it need to be a server.
could i install it on my laptop or does it need to be a server.
Iperf is a tool to check throughput between two points.You may install iperf into your both server to check throughput between two servers.Or if you want to check throughput between laptop to server then install a copy into laptop and another copy into server.
But throughput might vary based on computer performance and configuration or even Operating System.So better, try to test between two servers.
But throughput might vary based on computer performance and configuration or even Operating System.So better, try to test between two servers.
ASKER
Do i just place a copy on the c: of each server?
How do I get it to install?
Does this open its own application or am i going to be running it in the command prompt?
How do I get it to install?
Does this open its own application or am i going to be running it in the command prompt?
ASKER
This is the server end results
Interval - 10.2 secs
Transfer - 6.38 MBytes
Bandwidth - 5.23 Mbits/sec
These are the client side results
Interval - 10.2 secs
Transfer - 6.38 MBytes
Bandwidth - 5.27 Mbits/sec
What do the results mean?
Interval - 10.2 secs
Transfer - 6.38 MBytes
Bandwidth - 5.23 Mbits/sec
These are the client side results
Interval - 10.2 secs
Transfer - 6.38 MBytes
Bandwidth - 5.27 Mbits/sec
What do the results mean?
Too poor performance.
Keep each copy of iperf into both server.Then go to command prompt into server and go to iperf directory by 'cd' command.
Then run "iperf -s" //iperf -s inicates server mode
From another server use
iperf -c "Iperf Server IP" //This is client mode.
If you get the same results it means there is some problem in network.
For that case I'd suggest you to check throughput step by step:
1. Server to laptop by using a cable
2.Server to laptop through switch.
Then you could become to understand which is the cause for this less throughput.
Keep each copy of iperf into both server.Then go to command prompt into server and go to iperf directory by 'cd' command.
Then run "iperf -s" //iperf -s inicates server mode
From another server use
iperf -c "Iperf Server IP" //This is client mode.
If you get the same results it means there is some problem in network.
For that case I'd suggest you to check throughput step by step:
1. Server to laptop by using a cable
2.Server to laptop through switch.
Then you could become to understand which is the cause for this less throughput.
ASKER
Called Dell tech support , about the dropped packets.
They stated it might have to do with the spanning tree settings. But they are doing a best effort since I dont have support only hardware support.
i have no idea what the spanning tree does or doesnt do. Would you think that could be the issue?
They stated it might have to do with the spanning tree settings. But they are doing a best effort since I dont have support only hardware support.
i have no idea what the spanning tree does or doesnt do. Would you think that could be the issue?
Of course Spanning tree could be an issue for throughput.But there are other configurations also could impact for slow transfer.
To know about Spanning tree:
http://www.enterprisenetworkingplanet.com/netsp/article.php/3580966/Networking-101-Understanding-Spanning-Tree.htm
To know about Spanning tree:
http://www.enterprisenetworkingplanet.com/netsp/article.php/3580966/Networking-101-Understanding-Spanning-Tree.htm
ASKER
What other configurrations could affect it?
I enabled some ports to port fast. I set the root id to 4096 to allow the switch to become the root under the spanning tree. for some reason , the root id was pointing to a WAP.
This was the best effort advice from Dell.
I am still seeing packets dropped and i also ran th iperf test again and had the same results.
What else can I do?
I enabled some ports to port fast. I set the root id to 4096 to allow the switch to become the root under the spanning tree. for some reason , the root id was pointing to a WAP.
This was the best effort advice from Dell.
I am still seeing packets dropped and i also ran th iperf test again and had the same results.
What else can I do?
Leave the switch and connect your server and laptop using a cable.Then make iperf test between laptop and server.Be sure that server and laptop performance is ok.Then you could go for switch.
ASKER
The majority of our servers are virtual. The physical servers are our critical servers and at this tie I can not unplg the nic to do the testing.
Can I do from desktop to laptop? Will this work?
Can I do from desktop to laptop? Will this work?
Ok..you could check between desktop and latop using cable at first.Then connect both laptop and desktop into same switch and test again.
ASKER
here are my results
Desktop to Laptop
Interval- 10 sec
Transfer- 68 MBytes
Bandwidth-57 Mbits/sec
desktop to switch to laptop
Interval- 10 sec
Transfer- 90.4 MBytes
Bandwidth-75.8 Mbits/sec
Server to laptop
Interval- 10.2 sec
Transfer- 6.25MBytes
Bandwidth-5.16Mbits/sec
Desktop to Laptop
Interval- 10 sec
Transfer- 68 MBytes
Bandwidth-57 Mbits/sec
desktop to switch to laptop
Interval- 10 sec
Transfer- 90.4 MBytes
Bandwidth-75.8 Mbits/sec
Server to laptop
Interval- 10.2 sec
Transfer- 6.25MBytes
Bandwidth-5.16Mbits/sec
ASKER
for the latop to switch to desktop it is connected a differnt switch
It could be identified if you could check by the same switch that you used for server.Also be sure about your servers NIC speed settings.
ASKER
I did not understand the first part of your comment.
Both servers - Nic speed is set to auto
Both servers - Nic speed is set to auto
Use the same switch that is connected to your server.Connect your laptop and desktop into 2 free ports of that switch and make test between laptop and desktop.Then you can find out if there is any difference.
ASKER
there was
the server to laptop results
Server to laptop
Interval- 10.2 sec
Transfer- 6.25MBytes
Bandwidth-5.16Mbits/sec
the server to laptop results
Server to laptop
Interval- 10.2 sec
Transfer- 6.25MBytes
Bandwidth-5.16Mbits/sec
You didn't get my point.
You already made test between laptop to desktop which looks fine.But you used different switch.
So now use the switch which you used to test between server to laptop.
Please check if there are 2 free ports into that switch.Then connect both laptop and desktop into that switch and test between laptop to desktop.Basically, you are going to test that switch.
You already made test between laptop to desktop which looks fine.But you used different switch.
So now use the switch which you used to test between server to laptop.
Please check if there are 2 free ports into that switch.Then connect both laptop and desktop into that switch and test between laptop to desktop.Basically, you are going to test that switch.
ASKER
Interval- 10 sec
Transfer- 6.12MBytes
Bandwidth-5.14Mbits/sec
here are the results
Transfer- 6.12MBytes
Bandwidth-5.14Mbits/sec
here are the results
So, it's the configuration of your switch.
Could you please disable Spanning Tree for a while? Then you could test again if it's the reason for Spanning tree.One by one you can check other configurations too.
Could you please disable Spanning Tree for a while? Then you could test again if it's the reason for Spanning tree.One by one you can check other configurations too.
ASKER
Would it affect te network more if i disable the spannng tree?
Will the switch need to reboot?
Will the switch need to reboot?
You can disable Spanning Tree for temporarily. After test you can enable it again.I don't think it requires reboot.However, based on Switch model if it requires reboot it'll alert you to reboot.
Before changing any configuration better you take snapshot of each page/configuration backup.It'll help you to reconfigure again.
Before changing any configuration better you take snapshot of each page/configuration backup.It'll help you to reconfigure again.
ASKER
I have disabled spanning tree and flow control.
Still getting the same results
Still getting the same results
So, it's not for the spanning tree but other configuration of your switch.Are you using VLAN ? Check by connecting to other ports.
ASKER
no VLANs
I connected two laptops to each dell switch
1st test 223Mbits/Sec , laptop 1 server;laptop2 client
2nd test 141Mbits/Sec , laptop 2 server;laptop1 client
and kept same switch and ports on switch
laptop and physical server
3rd Test 84.3 Mbits/Sec , Server was the server;laptop was client
4th Test 4.98 Mbits/Sec , Server was client; laptop was server and same ports and same switch
I connected two laptops to each dell switch
1st test 223Mbits/Sec , laptop 1 server;laptop2 client
2nd test 141Mbits/Sec , laptop 2 server;laptop1 client
and kept same switch and ports on switch
laptop and physical server
3rd Test 84.3 Mbits/Sec , Server was the server;laptop was client
4th Test 4.98 Mbits/Sec , Server was client; laptop was server and same ports and same switch
Which switch are you using?? simply
iperf -c "server iP" //for client
iperf -s //for server
If it is same port and same switch for both cases you should get same throughput.However, it seems there are some port configuration which is the reason for different throughput for different traffic direction.
Finally, to be sure use laptop2 and physical server:
Physical server as server and laptop2 as client
Laptop2 as server and Physical server as client.
If it is the same result it must be your switch.Might be some switchport configuration.
iperf -c "server iP" //for client
iperf -s //for server
If it is same port and same switch for both cases you should get same throughput.However, it seems there are some port configuration which is the reason for different throughput for different traffic direction.
Finally, to be sure use laptop2 and physical server:
Physical server as server and laptop2 as client
Laptop2 as server and Physical server as client.
If it is the same result it must be your switch.Might be some switchport configuration.
ASKER
Dell support recommended to break the stack.
Sw 1 prt 1 --> Sw2 prt 1 ; Sw2 prt 2 --> Sw3 prt 1
If I do the iperf test from laptop to laptop I am getting
iperf -c x.x.x.x -d
transfer bandwidth
244MB 205Mb
235MB 195Mb
If I do the iperf test from laptop to server I get
iperf -c x.x.x.x -d
transfer bandwidth
271MB 227Mb
6MB 5.03Mb
Sw 1 prt 1 --> Sw2 prt 1 ; Sw2 prt 2 --> Sw3 prt 1
If I do the iperf test from laptop to laptop I am getting
iperf -c x.x.x.x -d
transfer bandwidth
244MB 205Mb
235MB 195Mb
If I do the iperf test from laptop to server I get
iperf -c x.x.x.x -d
transfer bandwidth
271MB 227Mb
6MB 5.03Mb
iperf - x.x.x.x -d indicates bi-directional throughput.From your test we can decide that server upload throughput is too low.So, there are only two possible reason:
1. Server upload capacity
2. Switchport configuration connected to that server.
If it possible you can connect your server to another tested switchport.If the throughput result shows same then it's the problem of your server.Otherwise, it's your specific that switchport
1. Server upload capacity
2. Switchport configuration connected to that server.
If it possible you can connect your server to another tested switchport.If the throughput result shows same then it's the problem of your server.Otherwise, it's your specific that switchport
ASKER
We did plug it into another switch and port same results.
if its the server , what would cause that because we have tested iperf on 4 or 5 severs and com back with the same results.
if its the server , what would cause that because we have tested iperf on 4 or 5 severs and com back with the same results.
Are those physical server?Have the samw NIC model ??
ASKER
Yes it is a physical server and we purchased a new NIC for the server.
we used a different switch, new cables that were tested.
we get the same results.
How do we get to improve server throughput to increase?
we used a different switch, new cables that were tested.
we get the same results.
How do we get to improve server throughput to increase?
So, it's sure that your server is the reason for this.
However, you need to check server performance first.Based on Server OS there are many server performance tool available.
1. Be sure all server performance is ok.
2. Check memory and CPU usage when you are downloading/iperf test
3. Try by disabling antivirus(If there is any antivirus)
4. Stop other unnecessary applications/program/servi ces then try again.
However, you need to check server performance first.Based on Server OS there are many server performance tool available.
1. Be sure all server performance is ok.
2. Check memory and CPU usage when you are downloading/iperf test
3. Try by disabling antivirus(If there is any antivirus)
4. Stop other unnecessary applications/program/servi
ASKER
I and get this on a few servers
xception: STATUS_ACCESS_VIOLATION at eip=6110D923
eax=00000014 ebx=00000000 ecx=FFFFFFFF edx=00000014 esi=00000001 edi=00000014
ebp=1A22C858 esp=1A22C854 program=C:\iperf\iperf.exe , pid 7892, thread unknown (0x2F54)
cs=001B ds=0023 es=0023 fs=003B gs=0000 ss=0023
Stack trace:
Frame Function Args
1A22C858 6110D923 (00000014, 1A22CB1A, 00410B8E, 00000001)
1A22CB38 61142926 (1A22D000, 1A22CB58, 00410B88, 1A22CBF8)
1A22CBD8 61118839 (004161F8, 00000020, 00410B88, 9F63D75A)
1A22CC18 610C01A5 (004161F8, 00000020, 30ED8C00, 42C27300)
1A22CC68 00403F9C (00EC7B20, 00000000, 1A22CCB8, 610E1469)
1A22CC88 004056A0 (00EC7AC0, 00000001, 00000001, 05A1DB2E)
1A22CCC8 004059F5 (00EC7AC0, 00000000, 00000001, 610713B0)
1A22CCF8 00405D19 (00EC7AB8, 00000000, 00000000, 00000000)
1A22CD18 00405DCD (00EC7AB8, 00412088, 00000000, 00000000)
1A22CD38 00405FBE (00E31088, 00000000, 00000000, 00E512C8)
1A22CD58 0040980D (00E31088, 00000000, 00000000, 00000000)
1A22CD98 610E38C5 (00E512C8, 1A22CDD4, 610E3810, 00E512C8)
End of stack trace
xception: STATUS_ACCESS_VIOLATION at eip=6110D923
eax=00000014 ebx=00000000 ecx=FFFFFFFF edx=00000014 esi=00000001 edi=00000014
ebp=1A22C858 esp=1A22C854 program=C:\iperf\iperf.exe
cs=001B ds=0023 es=0023 fs=003B gs=0000 ss=0023
Stack trace:
Frame Function Args
1A22C858 6110D923 (00000014, 1A22CB1A, 00410B8E, 00000001)
1A22CB38 61142926 (1A22D000, 1A22CB58, 00410B88, 1A22CBF8)
1A22CBD8 61118839 (004161F8, 00000020, 00410B88, 9F63D75A)
1A22CC18 610C01A5 (004161F8, 00000020, 30ED8C00, 42C27300)
1A22CC68 00403F9C (00EC7B20, 00000000, 1A22CCB8, 610E1469)
1A22CC88 004056A0 (00EC7AC0, 00000001, 00000001, 05A1DB2E)
1A22CCC8 004059F5 (00EC7AC0, 00000000, 00000001, 610713B0)
1A22CCF8 00405D19 (00EC7AB8, 00000000, 00000000, 00000000)
1A22CD18 00405DCD (00EC7AB8, 00412088, 00000000, 00000000)
1A22CD38 00405FBE (00E31088, 00000000, 00000000, 00E512C8)
1A22CD58 0040980D (00E31088, 00000000, 00000000, 00000000)
1A22CD98 610E38C5 (00E512C8, 1A22CDD4, 610E3810, 00E512C8)
End of stack trace
ASKER
I am also getting some of these
connect failed: Connection timed out
connect failed: Connection timed out
Might be corrupted file or antivirus/firewall is blocking
ASKER
We tried all 4 steps
We are getting the same results.
I am going through all our servers, virtual and physical, running the iperf test. So far all have the same results.
should the results be the same as laptop to latop. around 100 to 200 Mbps
We are getting the same results.
I am going through all our servers, virtual and physical, running the iperf test. So far all have the same results.
should the results be the same as laptop to latop. around 100 to 200 Mbps
Could you please check server to laptop directly (Without using switch) by a cross cable?
ASKER
Same results.
Which OS do you use for your servers?
ASKER
Windows 2008 and 2003
Download Microsoft Baseline Security Analyzer and check that which updates are missing:
http://www.microsoft.com/download/en/details.aspx?id=7558
Also Check windows Server 2003 performance Advisor:
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=15506
For Windows 2008:
http://msdn.microsoft.com/en-us/library/windows/hardware/gg463394.aspx
Also check Event log for any kinds of error.
http://www.microsoft.com/download/en/details.aspx?id=7558
Also Check windows Server 2003 performance Advisor:
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=15506
For Windows 2008:
http://msdn.microsoft.com/en-us/library/windows/hardware/gg463394.aspx
Also check Event log for any kinds of error.
Also post your server network card settings.Go to Device Manager>NIC Properties>Advanced.Then check all the options one by one and if possible post here.
ASKER
Intel Pro/1000 MT
Gigabit Master Slave Mode - Auto Detect
Jumbo Frames - Disabled
Locally Administred Address - empty
Log Link State Event - Enabled
Performance Options -
Adaptive Inter-frame spacing - Enabled
Flow Control - Generate & Respond
Interrupt Moderation Rate - Adaptive
Receive Descriptors - 256
Transmit Descriptors - 256
Qos Packet Tagging - Disabled
TCP/IP Offloading options - Everything is checked off
Wait for Link - Auto Detect
Gigabit Master Slave Mode - Auto Detect
Jumbo Frames - Disabled
Locally Administred Address - empty
Log Link State Event - Enabled
Performance Options -
Adaptive Inter-frame spacing - Enabled
Flow Control - Generate & Respond
Interrupt Moderation Rate - Adaptive
Receive Descriptors - 256
Transmit Descriptors - 256
Qos Packet Tagging - Disabled
TCP/IP Offloading options - Everything is checked off
Wait for Link - Auto Detect
Change receive and Transmit Descriptors and test with following values and test with iperf
1. RxD:128 TxD: 128
2. RxD:256 TxD: 128
3. RxD:256 TxD: 64
4. RxD:256 TxD: 32
5. RxD:256 TxD: 16
6. RxD:128 TxD: 64
7. RxD:64 TxD: 64
8. RxD:16 TxD:256
9. RxD:16 TxD:128
Collect the iperf result and see if any differences.
NB: each time you change enable/disable NIC
1. RxD:128 TxD: 128
2. RxD:256 TxD: 128
3. RxD:256 TxD: 64
4. RxD:256 TxD: 32
5. RxD:256 TxD: 16
6. RxD:128 TxD: 64
7. RxD:64 TxD: 64
8. RxD:16 TxD:256
9. RxD:16 TxD:128
Collect the iperf result and see if any differences.
NB: each time you change enable/disable NIC
ASKER
i tried a couple but same results. This are from the exchange server so it is hard to test out.
I looked at the other servers and they do not have all the options the same on the nic.
I looked at the other servers and they do not have all the options the same on the nic.
So, it might be some other options.But it's not the switch but your server is causing problem.
I don't know if you have updated your server or not.Using Baseline Security Analyzer you could check missing updates as I suggested before.I don't have no more idea.Might be some other experts could look on this issue.
I don't know if you have updated your server or not.Using Baseline Security Analyzer you could check missing updates as I suggested before.I don't have no more idea.Might be some other experts could look on this issue.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
But you checked throughput between laptop and server directly using a cross cable.So, still there's doubt by changing switch the problem would resolve or not.
ASKER
I know i will keep reasearching but for now the switch needs to be replaced anyways.
ASKER
Thanks for all the help. They are replacing the switch because they believe it has other hardware symptoms that are failing.
It is a free packet analyzer http://www.wireshark.org/
Also try doing a "tracert ip-of-other-comp" at a dos prompt. This will ping all the gateways in between.