Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Diagnose Server Network Card discards

Posted on 2011-09-08
11
Medium Priority
?
2,640 Views
Last Modified: 2012-05-12
We're running Windows Server 2008 R2 on Dell Poweredge servers. We have about 20. They all have the same network card settings, but some servers are 2950's from a few years ago, others are R710 and others are R510.

All of the servers have 2 onboard NICs enabled with iSCSI support and have the network settings below. However, on every 2950, one of the two NICs has discards for stretches of time that appear during the busiest times on the servers. The servers have an internal C drive (RAID1) and a 4 disk RAID10 set for part of our database (SQL). They also have at least 2 volumes presented by two different SANs via iSCSI. I have updated the drivers to the latest drive available from Dell.

The discards we can see in Perfmon and via our SNMP monitor software. The discards are only on that one NIC, not on the switch they are plugged into. The switch connects many servers, not just these three. The Switch is a Foundry/Brocade 48 port gigabit switch.

I have verifed all ports on the switch have the same settings and I have made the speed and duplex of the ports on the switch and servers be static just in case it was the auto setting.

The only thing that's unique about the port on each server that's having the problem is that it's the port with the gateway. The other NIC is on the same subnet but does not have a gateway.

Can anyone help me figure out why these NICs are discarding so many packets (2000+ per day, sometimes 4000+)? Thanks!

Settings across all servers (if the NIC has the setting):

ethernet @ wirespeed            En
flow control                          Tx & Rx En
interrupt moderation            Disabled
ipv4 checksum offload            None
ipv4 large send offload            Disabled
ipv6 checksum offload            None
ipv6 large send offload            Disabled
jumbo mtu                          9000
locally admin address            Not Pres
number of rss queues            8
Pause On Exh. Host Ring      Disabled
priority & vlan      P & V             En
receive buffers                  3000
receive side scaling            Disabled
speed & duplex                  1 GB Full Auto
tcp connection offload (ipv4)      Disabled
tcp connection offload (ipv6)      Disabled
transmit buffers                    5000
vlan id                                       0
VMQ Look Ahead Split            Disabled
Power Management            Off
0
Comment
Question by:MrVault
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 3
  • 2
11 Comments
 
LVL 22

Accepted Solution

by:
eeRoot earned 800 total points
ID: 36509964
Can you try a different cable and different port on the switch?  Can you lower the MTU from 9000 to 1500 for a few hours?  Any errors in the windows event log or the switch log?
0
 

Author Comment

by:MrVault
ID: 36510122
I can't really drop it down to 1500. This is serving iSCSI on a SAN. That will definitely cause issues because the SAN is presenting them at 9000.

The cable and port are the only things we haven't tried yet. Two issues with that though. First, what are the chances that the 1st port on all three 2950's or the cable in the first port, or the ports on the switch for just those three servers are all having issues? It's not the 2nd on any of them. Second, we have no more free ports. So that option will have to wait.

Any other ideas?
0
 
LVL 22

Assisted Solution

by:eeRoot
eeRoot earned 800 total points
ID: 36510950
Last thing i could recommend trying is to disable flow control and see if that reduces to number of errors.  Aside from that, the only other thing I can think of would be to call Dell and see if they have and recommendations or design considerations for iSCSI connections on these NIC's
0
Survive A High-Traffic Event with Percona

Your application or website rely on your database to deliver information about products and services to your customers. You can’t afford to have your database lose performance, lose availability or become unresponsive – even for just a few minutes.

 

Author Comment

by:MrVault
ID: 36510996
Thanks. We did contact Dell and are running what they recommend. I'm a bit leery of disabling flow control because when we forget to enable that on other servers, these issues appear. Only by enabling it do they go away. Thanks for the suggestions!
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 1200 total points
ID: 36511299
I would not look at the number of packets being discarded, I would look at the percentage of packets.

I know that 2000 - 4000 packets a day sounds like alot, but if you have 4 million packets a day that 1 out of every 1,000 packets or 1/10 of a percent if you drop 4,000 packets.


Are these NIC's dedicated to iSCSI traffic?

Are you getting any traffic over the other NIC?

Is the iSCSI SAN on the same IP subnet as the server?
0
 

Author Comment

by:MrVault
ID: 36511426
san is on same subnet. right now, like all the other servers these NICs serve up storage and regular non-iscsi traffic. that will be going away.

is there a perfmon stat I can monitor to see % packet discards or errors?
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 1200 total points
ID: 36512361
Not that I am aware of.

But by looking at the output of netstat -e, you see the number of packets (uni-cast and non-unicast) received, discarded and in error.

Are the discards outbound or inbound from the server's point of view.
0
 

Author Comment

by:MrVault
ID: 36531223
Here's our output for a few servers. Not sure how it computes these with multiple NICs, nor why it appears to happen on one NIC vs the other.

C:\>netstat -e
Interface Statistics

                           Received            Sent

Bytes                    3019229595      2112537785
Unicast packets          1932686966       370425405
Non-unicast packets        39692891           49207
Discards                       9345            9345
Errors                            0               0
Unknown protocols                 0

Open in new window


and

c:\>netstat -e
Interface Statistics

                           Received            Sent

Bytes                    3468778696      2839839847
Unicast packets            54670585        96118013
Non-unicast packets        39878052           50384
Discards                      26748           26748
Errors                            0               0
Unknown protocols                 0

Open in new window


and lastly

c:\>netstat -e
Interface Statistics

                           Received            Sent

Bytes                    3784049537      1700450048
Unicast packets          1317853860      4186925096
Non-unicast packets        39766172           48379
Discards                          0               0
Errors                            0               0
Unknown protocols                 0

Open in new window

0
 

Author Comment

by:MrVault
ID: 36531238
Not sure what time period netstat is based on. Since last reboot? Last 24 hours? Today?
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 1200 total points
ID: 36531695
The -e is a the combined total of all Ethernet interfaces.  So if you have two or more interfaces, you get 1 set of numbers that represent all interfaces.

The counters are since last reboot or counter wrap, I don't know if the counter is 32 or 64 bit.   My guess would be 32-bit on 32-bit OS and 64-bit on 64-bit OS.

Are you actually having problems?

Your worse case is just under 0.03% discard.  That is 1 packet out of every 3,300 packets.  Now, I would expect it to be zero, but you don't always get what you expect.

The only thing I find is weird is that your received and sent discard counts match exactly.  I would not expect that at all.
0
 

Author Comment

by:MrVault
ID: 36531822
it's definitely a 64bit OS.

I don't know for sure if we're having problems. These cards primarily serve up iSCSI volumes that have parts of a SQL database on them, so we're concerned about data integrity. we have seen cases in the past where enough of these errors happen that the volumes actually drops off as if someone pulled the drive out of the server (if it were physical). Our monitoring application puts the NIC in red (warning) if it exceed 1000 discards in a single day. it doesn't know percentages.
0

Featured Post

Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to install and use the NTBackup utility that comes with Windows Server.
Make the most of your online learning experience.
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…
There are cases when e.g. an IT administrator wants to have full access and view into selected mailboxes on Exchange server, directly from his own email account in Outlook or Outlook Web Access. This proves useful when for example administrator want…

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question