Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 8753
  • Last Modified:

500 Point question!!!! Etherreal checksum errors and slow Internet access

OK here is a good one.  I'v ebeen experiencing slow internet access form my site for a while now.  One site in particular will not load....www.good.com  See the tecxt below of a sniffer trace (etherreal) any ideas?  I've attached some troubleshooting that has happened already...We are looking for fresh ideas now.  

 
                                 
                                OK, I've been looking into xxxx  ths evening, and I found something that isn't right.
                                 
                                First, as summary of what you have in NJ..  As you may be aware, NJEDS has three (3) T1's for it's site.  The three ones are aggregated into a PPP Multilink bundle, giving a total of ~4.5Mb/s.  Currently, each connection is limited to 1.5Mb/s given how things are configured...more on that in a bit.

                                 
                                Here are the three interfaces:
                                 
                                NJEDS-ER-Site#
                                NJEDS-ER-Site#
                                NJEDS-ER-Site#show int ser0/0/0:0
                                Serial0/0/0:0 is up, line protocol is up
                                  Hardware is GT96K Serial
                                  Description: T1 to Paetec (CID# xx)
                                  MTU 1500 bytes, BW 1536 Kbit, DLY 20000 usec,
                                     reliability 255/255, txload 88/255, rxload 5/255
                                  Encapsulation PPP, LCP Open, multilink Open
                                  Link is a member of Multilink bundle Multilink1, loopback not set
                                  Keepalive set (10 sec)
                                  Last input 00:00:00, output 00:00:00, output hang never
                                  Last clearing of "show interface" counters 6d22h
                                  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
                                  Queueing strategy: fifo
                                  Output queue: 0/40 (size/max)
                                  5 minute input rate 34000 bits/sec, 32 packets/sec
                                  5 minute output rate 535000 bits/sec, 98 packets/sec
                                     22848457 packets input, 2888792823 bytes, 0 no buffer
                                     Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
                                     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
                                     62672123 packets output, 2924484279 bytes, 0 underruns
                                     0 output errors, 0 collisions, 0 interface resets
                                     0 output buffer failures, 0 output buffers swapped out
                                     0 carrier transitions
                                  Timeslot(s) Used:1-24, SCC: 0, Transmitter delay is 0 flags
                                NJEDS-ER-Site#
                                NJEDS-ER-Site#
                                NJEDS-ER-Site#
                                NJEDS-ER-Site#show int ser0/0/1:0
                                Serial0/0/1:0 is up, line protocol is up
                                  Hardware is GT96K Serial
                                  Description: T1 to Paetec (CID# xxxxx)
                                  MTU 1500 bytes, BW 1536 Kbit, DLY 20000 usec,
                                     reliability 255/255, txload 89/255, rxload 5/255
                                  Encapsulation PPP, LCP Open, multilink Open
                                  Link is a member of Multilink bundle Multilink1, loopback not set
                                  Keepalive set (10 sec)
                                  Last input 00:00:00, output 00:00:00, output hang never
                                  Last clearing of "show interface" counters 6d22h
                                  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
                                  Queueing strategy: fifo
                                  Output queue: 0/40 (size/max)
                                  5 minute input rate 32000 bits/sec, 30 packets/sec
                                  5 minute output rate 537000 bits/sec, 98 packets/sec
                                     22880386 packets input, 2887525686 bytes, 0 no buffer
                                     Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
                                     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
                                     62690729 packets output, 2936854900 bytes, 0 underruns
                                     0 output errors, 0 collisions, 0 interface resets
                                     0 output buffer failures, 0 output buffers swapped out
                                     0 carrier transitions
                                  Timeslot(s) Used:1-24, SCC: 1, Transmitter delay is 0 flags
                                NJEDS-ER-Site#
                                NJEDS-ER-Site#
                                NJEDS-ER-Site#
                                NJEDS-ER-Site#show int ser0/1/0:0
                                Serial0/1/0:0 is up, line protocol is up
                                  Hardware is GT96K Serial
                                  Description: T1 to Paetec (CID# xxxx..NJ)
                                  MTU 1500 bytes, BW 1536 Kbit, DLY 20000 usec,
                                     reliability 255/255, txload 88/255, rxload 5/255
                                  Encapsulation PPP, LCP Open, multilink Open
                                  Link is a member of Multilink bundle Multilink1, loopback not set
                                  Keepalive set (10 sec)
                                  Last input 00:00:00, output 00:00:00, output hang never
                                  Last clearing of "show interface" counters 6d22h
                                  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
                                  Queueing strategy: fifo
                                  Output queue: 0/40 (size/max)
                                  5 minute input rate 32000 bits/sec, 30 packets/sec
                                  5 minute output rate 535000 bits/sec, 97 packets/sec
                                     22813342 packets input, 2862958091 bytes, 0 no buffer
                                     Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
                                     107811 input errors, 82285 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
                                     62693039 packets output, 2933002928 bytes, 0 underruns
                                     0 output errors, 0 collisions, 0 interface resets
                                     0 output buffer failures, 0 output buffers swapped out
                                     0 carrier transitions
                                  Timeslot(s) Used:1-24, SCC: 0, Transmitter delay is 0 flags
                                NJEDS-ER-Site#
                                NJEDS-ER-Site#
                                 
                                 
                                Those three physical interfaces above are bundled into a single logical interface called a Multilink interface.  Here is the Multilink info on it:

                                 
                                Multilink1, bundle name is Px
                                  Endpoint discriminator is Pxx1
                                  Bundle up for 11w6d, 90/255 load
                                  Receive buffer limit 36000 bytes, frag timeout 1000 ms
                                    0/0 fragments/bytes in reassembly list
                                    41895 lost fragments, 6878661 reordered
                                    3186/3540062 discarded fragments/bytes, 3186 lost received
                                    0xAC53EB received sequence, 0x9D314F sent sequence
                                  Member links: 3 active, 0 inactive (max not set, min not set)
                                    Se0/1/0:0, since 11w6d
                                    Se0/0/0:0, since 11w6d
                                    Se0/0/1:0, since 9w1d
                                 
                                 
                                When you have a multilink path, reordering is normal and expected as packets can be sent by the other side of the link "out of order", at which point we have to hold onto the packet(s) in memory, and reorder them.  But, to me, the number seems a tad bit high given the amount of time the link has been up, but I'm just guessing.

                                 
                                With all that said, here's what I think is happening, and what I need you to do:
                                 
                                1) Notice the errors on Serial0/1/0:0.  A while back we were getting errors on one of the serial interfaces, and we called Paetec about it, but they said they were clean on their side.  Testing on our side AT THAT POINT IN TIME showed no errors, so we couldn't continue forward on it.  I kept an eye on it for a little bit, and it seemed fine...  But, looks like the errors have returned, and I need you to work with Paetec and TelCo to get this resolved.  The above output of the interfaces should show Paetec the errors, which is why I included them.  That I'm aware of, the Circuit ID's (CID#) should be accurate, assuming the cables haven't been swaped.  As I see it, this is Paetec to solve, so put the work on them (if possible), but they will need your assistance to let TelCo in, etc...

                                 
                                2) Because of the errors in Serial0/1/0:0, some packets are being dropped.  These packets are causing retries, thus introducing additional delays.  Do they account for the large amount of delays that Seth experienced?...I doubt it, but it's not helping...   The good thing, and this is by design, the three T1's are in a PPP Multilink.  As such, if we think the errors are really causing the issues, we can disconnect that T1, and the PPP Multilink will continue to function.

                                 
                                3) Once we get this T1 resolved, we can see about changing the way your PPP Multilink sends packets out.  Right now, each session/stream of traffic is bound to a SINGLE T1.  We can change it so that traffic is load balanced not by session, but rather, on a per packet basis.  This essentially will fully maximize your total bandwidth.  But, I don't want to do this until the Serial interface is fixed, and not taking on any greater percentage of errors over your other links.  Enabling this before that would be bad as every 3rd packet could have a problem.  But, in order for this to be fully useful, we need the ISP to do the same thing on their end.  So, once we get this corrected, we'll need to have Paetec do the same.  I'd highly suggest that you communicate our request to them sooner rather than later, and get from them in an e-mail that they can/will do it upon request.  I say this because ISP's are VERY reluctant to do this as it increases the CPU on their routers, as each packet needs to be inspected by the CPU.  Hence, they will often fight doing it...and for good reason (from their point of view).  FYI, the command that does this is called "ip load-sharing per-packet", but they should know that.  I'm mentioning it in case they are not familiar with it, and want to look it up in the Cisco documentation.

                                 
                               x




220       "9.149825"       "10.14.66.0"       "65.200.201.183"       "TCP"       "3461 > http [ACK] Seq=551 Ack=2720 Win=65535 [TCP CHECKSUM INCORRECT] Len=0"                              
221       "9.149881"       "65.200.201.189"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
222       "9.149924"       "65.200.201.189"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
223       "9.149934"       "10.14.66.0"       "65.200.201.189"       "TCP"       "3463 > http [ACK] Seq=263 Ack=1461 Win=65535 [TCP CHECKSUM INCORRECT] Len=0"                              
224       "9.150675"       "65.200.201.183"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
225       "9.150692"       "65.200.201.183"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
226       "9.150713"       "10.14.66.0"       "65.200.201.183"       "TCP"       "3461 > http [ACK] Seq=551 Ack=4180 Win=65535 [TCP CHECKSUM INCORRECT] Len=0"                              
227       "9.150791"       "65.200.201.183"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
228       "9.151453"       "65.200.201.183"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
229       "9.151478"       "10.14.66.0"       "65.200.201.183"       "TCP"       "3461 > http [ACK] Seq=551 Ack=5640 Win=65535 [TCP CHECKSUM INCORRECT] Len=0"                              
230       "9.151664"       "65.200.201.183"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
231       "9.152200"       "65.200.201.183"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
232       "9.152228"       "10.14.66.0"       "65.200.201.183"       "TCP"       "3461 > http [ACK] Seq=551 Ack=7100 Win=65535 [TCP CHECKSUM INCORRECT] Len=0"                              
233       "9.169350"       "65.200.201.189"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
234       "9.169372"       "65.200.201.189"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
235       "9.169395"       "10.14.66.0"       "65.200.201.189"       "TCP"       "3463 > http [ACK] Seq=263 Ack=2921 Win=65535 [TCP CHECKSUM INCORRECT] Len=0"                              
236       "9.169412"       "65.200.201.189"       "10.14.66.0"       "HTTP"       "HTTP/1.1 200 OK (application/x-javascript)"                              
237       "9.209488"       "65.200.201.189"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
238       "9.209523"       "65.200.201.189"       "10.14.66.0"       "HTTP"       "HTTP/1.1 200 OK (application/x-javascript)"                              
239       "9.209542"       "10.14.66.0"       "65.200.201.189"       "TCP"       "3464 > http [ACK] Seq=269 Ack=1266 Win=64270 [TCP CHECKSUM INCORRECT] Len=0"                              
240       "9.229354"       "65.200.201.183"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
241       "9.229385"       "65.200.201.183"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
242       "9.229408"       "10.14.66.0"       "65.200.201.183"       "TCP"       "3461 > http [ACK] Seq=551 Ack=8560 Win=65535 [TCP CHECKSUM INCORRECT] Len=0"                              
243       "9.229467"       "65.200.201.183"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
244       "9.229480"       "65.200.201.183"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
245       "9.229490"       "10.14.66.0"       "65.200.201.183"       "TCP"       "3461 > http [ACK] Seq=551 Ack=10020 Win=65535 [TCP CHECKSUM INCORRECT] Len=0"                              
246       "9.265050"       "10.14.66.0"       "10.14.64.254"       "DNS"       "Standard query A us.bc.yahoo.com"                              
247       "9.270274"       "65.200.201.183"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
248       "9.270299"       "65.200.201.183"       "10.14.66.0"       "TCP"       "[TCP segment of a reassembled PDU]"                              
249       "9.270318"       "10.14.66.0"       "65.200.201.183"       "TCP"       "3461 > http [ACK] Seq=551 Ack=11480 Win=65535 [TCP CHECKSUM INCORRECT] Len=0"                              
0
jcistaro
Asked:
jcistaro
  • 3
  • 2
1 Solution
 
rsivanandanCommented:
How is the line clock setup? I mean, are you and your ISP agree on the clock settings? usually from line I believe.

If so, did you try to 'clear counters' to flush all the errors and input problems from the router interfaces? or can you do that?

Then see if you get any problems? It sounds crazy but it happens with Cisco routers, I've seen them a couple of times. But this doesn't solve the problem.

Cheers,
Rajesh
0
 
jcistaroAuthor Commented:
The counter have been cleared and counters reset.  This didn't seem to help the situation.
0
 
rsivanandanCommented:
Did your interfaces come up okay? In the router diag and also by the link lights?

Cheers,
Rajesh
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Sam PanwarSr. Server AdministratorCommented:
Hi,

One forums result :

If the packets that have incorrect TCP checksums are all being sent by the machine on which Ethereal is running, this is probably because the network interface on which you're capturing does TCP checksum offloading. That means that the TCP checksum is added to the packet by the network interface, not by the OS's TCP/IP stack; when capturing on an interface, packets being sent by the host on which you're capturing are directly handed to the capture interface by the OS, which means that they are handed to the capture interface without a TCP checksum being added to them.

The only way to prevent this from happening would be to disable TCP checksum offloading, but

   1. that might not even be possible on some OSes;
   2. that could reduce networking performance significantly.

However, you can disable the check that Ethereal does of the TCP checksum, so that it won't report any packets as having TCP checksum errors, and so that it won't refuse to do TCP reassembly due to a packet having an incorrect TCP checksum. That can be set as an Ethereal preference by selecting "Preferences" from the "Edit" menu, opening up the "Protocols" list in the left-hand pane of the "Preferences" dialog box, selecting "TCP", from that list, turning off the "Check the validity of the TCP checksum when possible" option, clicking "Save" if you want to save that setting in your preference file, and clicking "OK".

It can also be set on the Ethereal or Tethereal command line with a -o tcp.check_checksum:false command-line flag, or manually set in your preferences file by adding a tcp.check_checksum:false line.


Check sum notes:

Ethereal checksum validation

Ethereal will validate the checksums of several potocols, e.g.: IP, TCP, ...

It will do the same calculation as a "normal receiver" would do, and shows the checksum fields in the packet details with a comment, e.g.: [correct], [invalid, must be 0x12345678] or alike.

Checksum validation can be switched off for various protocols in the Ethereal protocol preferences, e.g. to (very slightly) increase performance.

If the checksum validation is enabled and it detected an invalid checksum, features like packet reassembling won't be processed. This is avoided as incorrect connection data could "confuse" the internal database.
Checksum offloading

The checksum calculation might be done by the network driver, protocol driver or even in hardware.

For example: The Ethernet transmitting hardware calculates the Ethernet CRC32 checksum and the receiving hardware validates this checksum. If the received checksum is wrong Ethereal won't even see the packet, as the Ethernet hardware internally throws away the packet.

Higher level checksums are "traditionally" calculated by the protocol implementation and the completed packet is then handed over to the hardware.

Recent network hardware can perform advanced features such as IP checksum calculation, also known as checksum offloading. The network driver won't calculate the checksum itself but simply hand over an empty (zero or garbage filled) checksum field to the hardware.
Note!

Checksum offloading often causes confusion as the network packets to be transmitted are handed over to Ethereal before the checksums are actually calculated. Ethereal gets these "empty" checksums and displays them as invalid, even though the packets will contain valid checksums when they leave the network hardware later.

Checksum offloading can be confusing and having a lot of [invalid] messages on the screen can be quite annoying. As mentioned above, invalid checksums may lead to unreassembled packets, making the analysis of the packet data much harder.

You can do two things to avoid this checksum offloading problem:

    *

      Turn off the checksum offloading in the network driver, if this option is available.
    *

      Turn off checksum validation of the specific protocol in the Ethereal preferences.
0
 
Sam PanwarSr. Server AdministratorCommented:
Hi,

SLow problem :

Ethereal has one domain were it is poor, and that is NCP decodes.
Noone so far has taken the time to include an exhaustive list of NCP
decodes into the problem. For this reason, Ethereal is not very good
to diagnose Netware client connection problems, at least as long as
there are no obvious problems at IP level. Actually, most IP
performance problems problems (e.g. IPX fine, IP slow) are due to
configuration problems at switch or nic level and are not related to
protocol problems which could be seen with Ethereal. So  the first
thing to do is verify your switches and nics and see if the actually
used duplex settings for nics and switches are the same. Don't trust
the configured values, but if possible, try to find out what the
devices decided to finally use. On the server for instance, most nic
drivers either show the duplex setting at load time (and you can find
it in console.log), or in the lan/wan statistics in monitor.nlm (under
custom counters).

Note that you posted your message as a new thread rather than as a
followup to previous discussions on this issue you may have had. So I
have no idea on what you might already have done or discussed with
other sysops. I recommend you to keep posting in the same thread as
long as you are working on the same issue. This would make it easier
for everyone who is implied in the discussion



Forums discussion :according to Ronnie Sahlberg
The problem now is that fore VERY large captures, ethereal is always slow
under all circumstances.
So let us start with just a simple random generic capture and measure for it
to try to keep the number of variables low.
(If it is as you say the number of sessions affect it as well,  do you mean
the number of TCP sessions or what kind of sessions?
 At some point, when the worst performance problem has been addressed this
would be a very interesting area to look at.
 (I could create different synthetic capture files to measure with,   same
number of packets, same payload just different number of sessions)
 Make a note that you have observed the number of sessions to possibly have
an effect on the dissection speed so we dont forget to look at
  it furhter down the track
)

I currently belive that during refiltering of a capture, most time would be
spent inside file.c/add_packet_to_packet_list().
It would be VERY VERY useful to verify that this assumption is correct.
I would really like someone to look at gprof data and analyze where most
time is consumed to either verify my claim  add_packet_to_packet_list()
or to invalidate it.

The thing inside this function I think consumes the most cpu I belive would
be where we call epan_dissect_run() and perform a full dissection of the
packet.

As I see it, apart from the initial time we encounter the packet during file
read (or live capture) there are not that many instances where we really
must
dissect the packet at all.
OK. If we select a packet in the list so it gets displayed in the dissect
pane that might be an exception but that is not something that we do 100.000
times
per capture anyway so the performance of that is irrelevant.
We might also need to do a full rescan/redissect of all packets IF we have
changed the preferences in such a way that the packets will be dissected
differently  or when we have changed stuff using  DecodeAs.

However, for me and many other users, the MAIN reason ethereal rescans the
packet list is because we have applied or changed a filter. Some users will
filter and refilter a capture file over and over and over, ten, twenty,
thirty if not more times for each capture they work with.
Or see when a ConversationList dislog or a ServiceResponseTime dialog is
opened.

Well enough of that. To my idea:

Hypothesis:  A significant part of the slowness of ethereal when refiltering
a capture file comes from the expensive calls to epan_dissect_run() called
from add_packet_to_packet_list() in file.c
Potential fix: Reduce the number of calls made to epan_dissect_run() at the
expense of additional memory requirements (enabled by a preference)

Assuming that most of the time we perform a full rescan/redissect of the
capture file is when we really just want to reapply a display filter. (and
are not doing anything that affects how a packet is dissected).

What do we need in order to refilter the packet list  if we do not allow
calling epan_dissect_run()?
1, We need to remember all COL values for all packets so that we can just
reapply them when adding the packet to the packetlist without calling the
dissector and recreating them that way.   This will consume additional
memory.
2, For every packet we need to keep a list of all the hf_fields that were
encountered in the packet.
    This list contains the index of the hf variable as well as the value it
has.
    Nothing else needs to be stored there (in order to reduce the impact on
memory)
    This list may NOT be pruned as the edt structs are. This is because we
want to be able to still use this list even after the filters have changed
and thus
the pruning would be different.   No pruning.
The "ApplyFilterToEdtStructure" fucntions would need to be changed (or
duplicated) so they could operate on the list in 2 instead of the edt
structure.
This function might also need to be looked at so that it would be efficient
even for very large lists (no pruning)

1 would allow us to rebuild the packet list without needing to call the
dissector (?)
2 would allow us to refilter the entire trace without calling any
dissectors.

ideas, comments?

Right now it would be nice if someone could create a capture as I proposed
earlier and use GPROF to check where most of the CPU is spent when
refiltering the capture. To verify if my assumptions are correct or
invalidate them.

(
As a nice benefit in the future, IF we were to have that list of fields for
each packet, easily available, we could do things like merging this list
between packets.
Say #6 is the Call and #27 is the Response.
Since these packets are paired we could merge the lists from these two
packets into a single one.
Then when searcing for something that occured in the Response packet, we
would automatically also pick up the matching Call packet sinte their lists
were merged.
I.e filtering for smb.error==foo   would both find the Response that barfed
saying foo  but also teh matched Call to this Response.
That would also be useful.
)
0
 
rsivanandanCommented:
thnx.

Cheers,
Rajesh
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now