Solved

Diagnose Server Network Card discards

Posted on 2011-09-08
11
2,170 Views
Last Modified: 2012-05-12
We're running Windows Server 2008 R2 on Dell Poweredge servers. We have about 20. They all have the same network card settings, but some servers are 2950's from a few years ago, others are R710 and others are R510.

All of the servers have 2 onboard NICs enabled with iSCSI support and have the network settings below. However, on every 2950, one of the two NICs has discards for stretches of time that appear during the busiest times on the servers. The servers have an internal C drive (RAID1) and a 4 disk RAID10 set for part of our database (SQL). They also have at least 2 volumes presented by two different SANs via iSCSI. I have updated the drivers to the latest drive available from Dell.

The discards we can see in Perfmon and via our SNMP monitor software. The discards are only on that one NIC, not on the switch they are plugged into. The switch connects many servers, not just these three. The Switch is a Foundry/Brocade 48 port gigabit switch.

I have verifed all ports on the switch have the same settings and I have made the speed and duplex of the ports on the switch and servers be static just in case it was the auto setting.

The only thing that's unique about the port on each server that's having the problem is that it's the port with the gateway. The other NIC is on the same subnet but does not have a gateway.

Can anyone help me figure out why these NICs are discarding so many packets (2000+ per day, sometimes 4000+)? Thanks!

Settings across all servers (if the NIC has the setting):

ethernet @ wirespeed            En
flow control                          Tx & Rx En
interrupt moderation            Disabled
ipv4 checksum offload            None
ipv4 large send offload            Disabled
ipv6 checksum offload            None
ipv6 large send offload            Disabled
jumbo mtu                          9000
locally admin address            Not Pres
number of rss queues            8
Pause On Exh. Host Ring      Disabled
priority & vlan      P & V             En
receive buffers                  3000
receive side scaling            Disabled
speed & duplex                  1 GB Full Auto
tcp connection offload (ipv4)      Disabled
tcp connection offload (ipv6)      Disabled
transmit buffers                    5000
vlan id                                       0
VMQ Look Ahead Split            Disabled
Power Management            Off
0
Comment
Question by:MrVault
  • 6
  • 3
  • 2
11 Comments
 
LVL 21

Accepted Solution

by:
eeRoot earned 200 total points
Comment Utility
Can you try a different cable and different port on the switch?  Can you lower the MTU from 9000 to 1500 for a few hours?  Any errors in the windows event log or the switch log?
0
 

Author Comment

by:MrVault
Comment Utility
I can't really drop it down to 1500. This is serving iSCSI on a SAN. That will definitely cause issues because the SAN is presenting them at 9000.

The cable and port are the only things we haven't tried yet. Two issues with that though. First, what are the chances that the 1st port on all three 2950's or the cable in the first port, or the ports on the switch for just those three servers are all having issues? It's not the 2nd on any of them. Second, we have no more free ports. So that option will have to wait.

Any other ideas?
0
 
LVL 21

Assisted Solution

by:eeRoot
eeRoot earned 200 total points
Comment Utility
Last thing i could recommend trying is to disable flow control and see if that reduces to number of errors.  Aside from that, the only other thing I can think of would be to call Dell and see if they have and recommendations or design considerations for iSCSI connections on these NIC's
0
 

Author Comment

by:MrVault
Comment Utility
Thanks. We did contact Dell and are running what they recommend. I'm a bit leery of disabling flow control because when we forget to enable that on other servers, these issues appear. Only by enabling it do they go away. Thanks for the suggestions!
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 300 total points
Comment Utility
I would not look at the number of packets being discarded, I would look at the percentage of packets.

I know that 2000 - 4000 packets a day sounds like alot, but if you have 4 million packets a day that 1 out of every 1,000 packets or 1/10 of a percent if you drop 4,000 packets.


Are these NIC's dedicated to iSCSI traffic?

Are you getting any traffic over the other NIC?

Is the iSCSI SAN on the same IP subnet as the server?
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:MrVault
Comment Utility
san is on same subnet. right now, like all the other servers these NICs serve up storage and regular non-iscsi traffic. that will be going away.

is there a perfmon stat I can monitor to see % packet discards or errors?
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 300 total points
Comment Utility
Not that I am aware of.

But by looking at the output of netstat -e, you see the number of packets (uni-cast and non-unicast) received, discarded and in error.

Are the discards outbound or inbound from the server's point of view.
0
 

Author Comment

by:MrVault
Comment Utility
Here's our output for a few servers. Not sure how it computes these with multiple NICs, nor why it appears to happen on one NIC vs the other.

C:\>netstat -e
Interface Statistics

                           Received            Sent

Bytes                    3019229595      2112537785
Unicast packets          1932686966       370425405
Non-unicast packets        39692891           49207
Discards                       9345            9345
Errors                            0               0
Unknown protocols                 0

Open in new window


and

c:\>netstat -e
Interface Statistics

                           Received            Sent

Bytes                    3468778696      2839839847
Unicast packets            54670585        96118013
Non-unicast packets        39878052           50384
Discards                      26748           26748
Errors                            0               0
Unknown protocols                 0

Open in new window


and lastly

c:\>netstat -e
Interface Statistics

                           Received            Sent

Bytes                    3784049537      1700450048
Unicast packets          1317853860      4186925096
Non-unicast packets        39766172           48379
Discards                          0               0
Errors                            0               0
Unknown protocols                 0

Open in new window

0
 

Author Comment

by:MrVault
Comment Utility
Not sure what time period netstat is based on. Since last reboot? Last 24 hours? Today?
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 300 total points
Comment Utility
The -e is a the combined total of all Ethernet interfaces.  So if you have two or more interfaces, you get 1 set of numbers that represent all interfaces.

The counters are since last reboot or counter wrap, I don't know if the counter is 32 or 64 bit.   My guess would be 32-bit on 32-bit OS and 64-bit on 64-bit OS.

Are you actually having problems?

Your worse case is just under 0.03% discard.  That is 1 packet out of every 3,300 packets.  Now, I would expect it to be zero, but you don't always get what you expect.

The only thing I find is weird is that your received and sent discard counts match exactly.  I would not expect that at all.
0
 

Author Comment

by:MrVault
Comment Utility
it's definitely a 64bit OS.

I don't know for sure if we're having problems. These cards primarily serve up iSCSI volumes that have parts of a SQL database on them, so we're concerned about data integrity. we have seen cases in the past where enough of these errors happen that the volumes actually drops off as if someone pulled the drive out of the server (if it were physical). Our monitoring application puts the NIC in red (warning) if it exceed 1000 discards in a single day. it doesn't know percentages.
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Even if you have implemented a Mobile Device Management solution company wide, it is a good idea to make sure you are taking into account all of the major risks to your electronic protected health information (ePHI).
Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
This tutorial will walk an individual through configuring a drive on a Windows Server 2008 to perform shadow copies in order to quickly recover deleted files and folders. Click on Start and then select Computer to view the available drives on the se…
This tutorial will show how to configure a new Backup Exec 2012 server and move an existing database to that server with the use of the BEUtility. Install Backup Exec 2012 on the new server and apply all of the latest hotfixes and service packs. The…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now