Solved

Diagnose Server Network Card discards

Posted on 2011-09-08
11
2,227 Views
Last Modified: 2012-05-12
We're running Windows Server 2008 R2 on Dell Poweredge servers. We have about 20. They all have the same network card settings, but some servers are 2950's from a few years ago, others are R710 and others are R510.

All of the servers have 2 onboard NICs enabled with iSCSI support and have the network settings below. However, on every 2950, one of the two NICs has discards for stretches of time that appear during the busiest times on the servers. The servers have an internal C drive (RAID1) and a 4 disk RAID10 set for part of our database (SQL). They also have at least 2 volumes presented by two different SANs via iSCSI. I have updated the drivers to the latest drive available from Dell.

The discards we can see in Perfmon and via our SNMP monitor software. The discards are only on that one NIC, not on the switch they are plugged into. The switch connects many servers, not just these three. The Switch is a Foundry/Brocade 48 port gigabit switch.

I have verifed all ports on the switch have the same settings and I have made the speed and duplex of the ports on the switch and servers be static just in case it was the auto setting.

The only thing that's unique about the port on each server that's having the problem is that it's the port with the gateway. The other NIC is on the same subnet but does not have a gateway.

Can anyone help me figure out why these NICs are discarding so many packets (2000+ per day, sometimes 4000+)? Thanks!

Settings across all servers (if the NIC has the setting):

ethernet @ wirespeed            En
flow control                          Tx & Rx En
interrupt moderation            Disabled
ipv4 checksum offload            None
ipv4 large send offload            Disabled
ipv6 checksum offload            None
ipv6 large send offload            Disabled
jumbo mtu                          9000
locally admin address            Not Pres
number of rss queues            8
Pause On Exh. Host Ring      Disabled
priority & vlan      P & V             En
receive buffers                  3000
receive side scaling            Disabled
speed & duplex                  1 GB Full Auto
tcp connection offload (ipv4)      Disabled
tcp connection offload (ipv6)      Disabled
transmit buffers                    5000
vlan id                                       0
VMQ Look Ahead Split            Disabled
Power Management            Off
0
Comment
Question by:MrVault
  • 6
  • 3
  • 2
11 Comments
 
LVL 22

Accepted Solution

by:
eeRoot earned 200 total points
ID: 36509964
Can you try a different cable and different port on the switch?  Can you lower the MTU from 9000 to 1500 for a few hours?  Any errors in the windows event log or the switch log?
0
 

Author Comment

by:MrVault
ID: 36510122
I can't really drop it down to 1500. This is serving iSCSI on a SAN. That will definitely cause issues because the SAN is presenting them at 9000.

The cable and port are the only things we haven't tried yet. Two issues with that though. First, what are the chances that the 1st port on all three 2950's or the cable in the first port, or the ports on the switch for just those three servers are all having issues? It's not the 2nd on any of them. Second, we have no more free ports. So that option will have to wait.

Any other ideas?
0
 
LVL 22

Assisted Solution

by:eeRoot
eeRoot earned 200 total points
ID: 36510950
Last thing i could recommend trying is to disable flow control and see if that reduces to number of errors.  Aside from that, the only other thing I can think of would be to call Dell and see if they have and recommendations or design considerations for iSCSI connections on these NIC's
0
 

Author Comment

by:MrVault
ID: 36510996
Thanks. We did contact Dell and are running what they recommend. I'm a bit leery of disabling flow control because when we forget to enable that on other servers, these issues appear. Only by enabling it do they go away. Thanks for the suggestions!
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 300 total points
ID: 36511299
I would not look at the number of packets being discarded, I would look at the percentage of packets.

I know that 2000 - 4000 packets a day sounds like alot, but if you have 4 million packets a day that 1 out of every 1,000 packets or 1/10 of a percent if you drop 4,000 packets.


Are these NIC's dedicated to iSCSI traffic?

Are you getting any traffic over the other NIC?

Is the iSCSI SAN on the same IP subnet as the server?
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 

Author Comment

by:MrVault
ID: 36511426
san is on same subnet. right now, like all the other servers these NICs serve up storage and regular non-iscsi traffic. that will be going away.

is there a perfmon stat I can monitor to see % packet discards or errors?
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 300 total points
ID: 36512361
Not that I am aware of.

But by looking at the output of netstat -e, you see the number of packets (uni-cast and non-unicast) received, discarded and in error.

Are the discards outbound or inbound from the server's point of view.
0
 

Author Comment

by:MrVault
ID: 36531223
Here's our output for a few servers. Not sure how it computes these with multiple NICs, nor why it appears to happen on one NIC vs the other.

C:\>netstat -e
Interface Statistics

                           Received            Sent

Bytes                    3019229595      2112537785
Unicast packets          1932686966       370425405
Non-unicast packets        39692891           49207
Discards                       9345            9345
Errors                            0               0
Unknown protocols                 0

Open in new window


and

c:\>netstat -e
Interface Statistics

                           Received            Sent

Bytes                    3468778696      2839839847
Unicast packets            54670585        96118013
Non-unicast packets        39878052           50384
Discards                      26748           26748
Errors                            0               0
Unknown protocols                 0

Open in new window


and lastly

c:\>netstat -e
Interface Statistics

                           Received            Sent

Bytes                    3784049537      1700450048
Unicast packets          1317853860      4186925096
Non-unicast packets        39766172           48379
Discards                          0               0
Errors                            0               0
Unknown protocols                 0

Open in new window

0
 

Author Comment

by:MrVault
ID: 36531238
Not sure what time period netstat is based on. Since last reboot? Last 24 hours? Today?
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 300 total points
ID: 36531695
The -e is a the combined total of all Ethernet interfaces.  So if you have two or more interfaces, you get 1 set of numbers that represent all interfaces.

The counters are since last reboot or counter wrap, I don't know if the counter is 32 or 64 bit.   My guess would be 32-bit on 32-bit OS and 64-bit on 64-bit OS.

Are you actually having problems?

Your worse case is just under 0.03% discard.  That is 1 packet out of every 3,300 packets.  Now, I would expect it to be zero, but you don't always get what you expect.

The only thing I find is weird is that your received and sent discard counts match exactly.  I would not expect that at all.
0
 

Author Comment

by:MrVault
ID: 36531822
it's definitely a 64bit OS.

I don't know for sure if we're having problems. These cards primarily serve up iSCSI volumes that have parts of a SQL database on them, so we're concerned about data integrity. we have seen cases in the past where enough of these errors happen that the volumes actually drops off as if someone pulled the drive out of the server (if it were physical). Our monitoring application puts the NIC in red (warning) if it exceed 1000 discards in a single day. it doesn't know percentages.
0

Featured Post

Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Looking for a MFP for a small office network 26 122
IP Calculator 10 52
VIRL IP adress 3 57
Way to setup network drive share permanently mapped to server 3 44
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Most of the applications these days are on Cloud. Cloud is ubiquitous with many service providers in the market. Since it has many benefits such as cost reduction, software updates, remote access, disaster recovery and much more.
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…
This tutorial will show how to configure a new Backup Exec 2012 server and move an existing database to that server with the use of the BEUtility. Install Backup Exec 2012 on the new server and apply all of the latest hotfixes and service packs. The…

948 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now