Solved

PPPOA disconnects with large file uploads (only)

Posted on 2008-10-04
15
776 Views
Last Modified: 2013-12-24
We have been experiencing intermittent/temporary ADSL PPPOA dropouts at a couple of sites.
Typically the ADSL modem is an Actiontec GT701 although we have also used an old Wirespeed and a Cisco 1710.

The test to recreate the problem:
Set up a continuing ping to a couple of remote sites.
Start a large file ftp transfer OUT of the local site.  i.e. an "upload".  
(Outgoing emails with larger attachments seem to cause the problem during normal operations).

With the ftp transfer going we see a few ping dropouts and then a total dropout.
The modem drops the PPPOA connection when this happens.
The problem solves itself after 20, 30 or 40 seconds - depending on the particular configuration of modem, etc. and this interruption time will be rather constant from outage to outage.

This happened at one site in July and we made it better / acceptable by changing the modem from PPPOE to PPPOA and assigning a public IP address to the modem.  

Then it started happening a couple of weeks ago at another site about 50 miles away.
The telco and the ISP are the same at both sites and things become common at some point before or when reaching the ISP equipment.  

I have good contact / communication with the ISP and only third-hand via the ISP with the telco.
We are all scratching our heads as to what must be happening and what we might do about it.

Rule out all of our office site equipment.
I can connect a bare laptop directly to the modem and recreate the problem.
0
Comment
Question by:Fred Marshall
  • 8
  • 4
  • 2
15 Comments
 
LVL 10

Assisted Solution

by:kyleb84
kyleb84 earned 250 total points
ID: 22699881
Theory:

When pushing a large file across the DSL link, the utilisation climbs and the SNR may drop close to threshold, the DSLAM the resets the connection.

What are your SNR Margins for the links?

Has the ISP tried a different DSLAM Port? Maybe a different DSLAM?

Can you go to the Master Telco Frame in your building, patch an ADSL modem in there and try the upload?
0
 
LVL 25

Author Comment

by:Fred Marshall
ID: 22700601
OK - this is interesting although I don't necessarily understand all the terms.
I *do* understand SNR.

This situation looks like this:

The telco provides the copper and ... whatever else.
The ISP is separate and provides the PPPOA as I understand it.
The ISP is stumped.
I believe the DSLAM is with the telco.  So the DSLAM port and DSLAM questions would have to be put to them.

The ISP is indeed the "service provider" and all the telco provides is the line.
The ISP is responsible for the DSL although it flows through telco equipment.
Thus, the prodding needs to be through the ISP but likely lands with the telco.

There is no "Master Telco Frame" as nearly as I can determine.  The buildings are rather small.  We have switched modems, checked the lines, etc.

Thanks!

0
 
LVL 9

Assisted Solution

by:Press2Esc
Press2Esc earned 250 total points
ID: 22708089
hope your ready to roll up your sleeves.....

Q: are you running across vpn?  
Q: are you dropping packets regularly?
Q: post results from "ping google.com -l 1400 -n 15"  (l=buffer size, n=count)
Q: post results from "tracert google.com"
Q: when you run the PPPoA (or PPPoE) authentication either in the dsl modem/router OR the cisco - are the intermittent results the same?
Q: whichever device (e.g., Actiontec, Wirelespeed, Cisco, etc) is running the PPP, please post the line stats?  (cisco "#sh dsl int atm 0")
Q: what kind of speeds are you running & have you ran a speed test (speakeasy.net/speedtest) to confirm them?
Q: who is your ISP?

P2E
0
 
LVL 25

Author Comment

by:Fred Marshall
ID: 22708763
OK - I can answer most of your questions off the top of my head:

First, we were running the modem in PPPOE bridge mode so it looks like a switch port and we get multiple static public IP addresses that way and has no IP of its own.
Then, on a huch, we switched to PPPOA, assigning one IP to the modem that is outside our block of addresses.  This seemed to help.  Actually, this was *the* solution for a couple of months at the first site to fail.  Then the second site failed and this "fix" didn't work.
Well, we never did isolate the root cause, so the fix was a bandaid to begin with.  And, thus my question here at EE.

The multiple public IP addresses are "connected" via a single ADSL connection - so the devices share the bandwidth.  Most of the time the bandwidth is modest anyway.

VPN: yes on a couple of the IPs; no on others; like this:
VPN #1 and VPN#2 on an RV042 with its own IP.
3rd party VPN on a new Cisco VPN router with its own IP (not under our control / no access).
Internet firewall on a Juniper Networks SSG5 with its own IP.
... it just worked out best this way although we might have used one fewer IP addresses by running VPNs through the Juniper.  The RV042 VPNs just work and preceded the Juniper box.

For testing:
Continuously ping an address on the 3rd party VPN remote subnet.  This is critical to operations so it makes sense to test it.
continuously ping google through the firewall.
initiate a large file upload using ftp - through the firewall.
..
the ping times increase from around 100ms to over 1,000msec in some cases during the upload.
pings drop out perhaps more frequently but not *too* frequently if it's going to *not* fail.
pings drop out first one then a few then all of them continuously for 30 to 40 seconds in a "failure".  This causes the 3rd party VPN to drop out along with everything else.
Then the connection is automatically remade and the pings return.

We aren't dropping packets regularly unless two things are happening:
1) the "problem" exists - when generally means it is there for days or longer and maybe never goes away.  I don't know because we've always "fixed" it or it fixed itself after a number of days/weeks.
2) there is a large file upload happening like an email with a large attachment.  Most other things don't cause dropouts.

Packet losses increase from nil but slightly during an upload if there's not going to be a failure.

I dont' have a tracert to google but could generate one if necessary.  Right now I don't know that I can make the failure occur.  It would only work pre-failure anyway.

The Cisco "modem" seems to have a bit better results but it too has failed on occasion.

I can get results from the Actiontec but I'm not sure it generates line stats.  
I don't have access to the Cisco "modem".
I'm not sure but I think the Cisco is still in use at the one site it's located.
I can try to access the Actiontec there - if I don't see it then it's not connected / in use.

Yes, we've run speakeasy.net tests often enough.  One site has 3M down and 500k up.  Another has around 1700k down and 500k up ... more or less.  Both of these sites have failed at one time or another.

The ISP is Reachone.  The telco is Centurytel.  
This is in SW Washington state.

After you've absorbed this, let me know what I can do to improve the information, including any of the specific things you've already asked for.

0
 
LVL 25

Author Comment

by:Fred Marshall
ID: 22708773
While it should be obvious because this is about PPPOA dropouts:

Rarely one will see packet losses on one and not on the other but not when there's a failure of course.  When one ping stream stops responding altogether, so does the other - coinciding with a PPPOA disconnect.
0
 
LVL 9

Expert Comment

by:Press2Esc
ID: 22711033
Thanks for the info...

From your speedtest results, I am suspect you may potentially be having a provisioning issue a/o line capacity issue.  Unfortunateky, it does not appear the Actiontec has a line spec listing...  

I would encourage you to contact the tech helpdesk and ask them for your specific line characteristcs (e.g., noise, attenuation, capacity, provisioning, etc)...  When you get this critical info, post it. If applicable, I can dycpher the info..

BTW, is the DSL line provisioned for 3M/? or 6M/512?  Is Reachone (ISP) simply a reseller and Centurytel owns the equipment (e.g., DSLAM, NOC, etc)?.

Judging by your network needs and multiple subnets, you are very likely pushing (beyond?) you connection capacity..  How many total workstation, servers, routers/firewalls, etc are sharing the DSL line?

My initial request for a tracert is because this data may indicate some latency a/o potential TTL issues.



P2E
0
 
LVL 25

Author Comment

by:Fred Marshall
ID: 22716008
I need to "package" this information so that I can pass it to the ISP who will pass it to the telco as apprpriate.

Reachone is an ISP with servers and routers providing a variety of network services - IP address blocks, email, doman hosting, etc. etc.

Centurytel is a telco that provides communications (including ADSL) from the customer to an interim geographical point from which Reachone takes over on their own equipment / links, etc.\

It's my understanding that the PPPOA termination is in a Reachone router.

The initial office to fail has 8 or 9 workstations.
I have described the devices sharing the DSL lines.
The second office to fail has 6 workstations.
Same description.

DSLAMs are Centurytel's.

Tracing route to google.com [64.233.187.99]
over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms  192.168.xxx.xxx
  2     1 ms     2 ms     1 ms  dsl-238-81.satsop.reachoneinternet.net [216.177.
238.81]
  3    42 ms    45 ms    42 ms  dsl-230-1.satsop.reachoneinternet.net [216.177.2
30.1]
  4    46 ms    46 ms    46 ms  fa0-0-cr01-pdxp.reachoneinternet.net [216.177.25
5.37]
  5    50 ms    47 ms    55 ms  ip65-47-24-97.z24-47-65.customer.algx.net [65.47
.24.97]
  6    47 ms    52 ms    47 ms  p4-3-0.mar1.beaverton-or.us.xo.net [207.88.83.20
1]
  7    52 ms    52 ms    51 ms  p5-1-0-1.rar1.seattle-wa.us.xo.net [65.106.0.141
]
  8    71 ms    73 ms    69 ms  p5-0-0.RAR2.SanJose-CA.us.xo.net [65.106.0.50]
  9    68 ms    71 ms    69 ms  207.88.14.105.ptr.us.xo.net [207.88.14.105]
 10    69 ms    68 ms    70 ms  207.88.14.106.ptr.us.xo.net [207.88.14.106]
 11    69 ms    72 ms    74 ms  207.88.187.50.ptr.us.xo.net [207.88.187.50]
 12    71 ms    70 ms    70 ms  216.239.46.194
 13   244 ms    76 ms   197 ms  72.14.239.15
 14   154 ms   131 ms   150 ms  72.14.236.175
 15   153 ms   195 ms   158 ms  216.239.49.226
 16   151 ms   144 ms   137 ms  72.14.236.175
 17   144 ms   146 ms   148 ms  jc-in-f99.google.com [64.233.187.99]

Trace complete.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 9

Expert Comment

by:Press2Esc
ID: 22716392
fm, the trace looks good, no TTL issues..

reachone is your reseller, so i'm guessing you must go thru them (?). in any case, i would guess that reachone's clout with the isp, is predominately dependent on their $ize and contract.

as stated earlier, contact the your provider and ask them for all your "line stats" aka dsl footprint.  also, have them verify your provisioning... hopefully, their tech support will understand the ramifications of the specs as they may relate to your issue.  if not, post 'em...

P2E
0
 
LVL 25

Author Comment

by:Fred Marshall
ID: 22716504
Yes.  We are Reachone's customer.

Have sent the request.
0
 
LVL 25

Author Comment

by:Fred Marshall
ID: 22827886
Here are the stats from the modem at our end.  This seems to be the best that they can suggest.  I note that the Downstream margin of 29 (I believe that's dB) and the Upstream margin of 19 never change .. for whatever that's worth.

I have found a workaround that at least I have control over.  We have a managed switch just downstream from the modem.  It has QoS capability.  It seems that if I do traffic shaping for upstream traffic bandwidth and set it *just below* the delivered speed from the otherwise unlimited link speed, then the ping times don't increase and the dropouts don't occur during a large file upload.  Without this internal bandwidth limit, the problems occur.  I don't think anyone will notice the difference in speed as it's "close".

Now, philosophically, I don't think that *we* should have to limit the bandwidth in order for the link to work properly - but a fix is a fix.

cat /proc/avalanche/avsar_modem_stats

7 DSL Modem Statistics:
------------------------------
SL Modem Stats]
      US Connection Rate:     608     DS Connection Rate:     3584
      DS Line Attenuation:    4       DS Margin:              29
      US Line Attenuation:    8       US Margin:              19
      US Payload :            647461440       DS Payload:             51335995

      US Superframe Cnt :     21648590        DS Superframe Cnt:      21648590
      US Transmit Power :     11      DS Transmit Power:      13
      LOS errors:             0       SEF errors:             0
      Frame mode:             3       Max Frame mode:         0
      Trained Path:           0       US Peak Cell Rate:      1433
      Trained Mode:           3       Selected Mode:          1
      ATUC Vendor ID: 1095516994      ATUC Revision:          1
      Hybrid Selected:        1

      [Upstream (TX) Interleave path]
      CRC:    0       FEC:    0       NCD:    1
      LCD:    0       HEC:    0

      [Downstream (RX) Interleave path]
      CRC:    0       FEC:    0       NCD:    0
      LCD:    0       HEC:    0

      [Upstream (TX) Fast path]
      CRC:    2       FEC:    34      NCD:    0
      LCD:    0       HEC:    0

      [Downstream (RX) Fast path]
      CRC:    1       FEC:    0       NCD:    0
      LCD:    0       HEC:    0

TM Stats]
      [Upstream/TX]
      Good Cell Cnt:  13488780
      Idle Cell Cnt:  514246659


      [Downstream/RX)]
      Good Cell Cnt:  10694999
      Idle Cell Cnt:  3100166526
      Bad Hec Cell Cnt:       0
      Overflow Dropped Cell Cnt:      0

AR AAL5 Stats]
      Tx PDU's:       1900386
      Rx PDU's:       1854629
      Tx Total Bytes: 574556314
      Rx Total Bytes: 435557320
      Tx Total Error Counts:  0
      Rx Total Error Counts:  7243


AM Stats]
      Near End F5 Loop Back Count:    38822
      Near End F4 Loop Back Count:    0
      Far End F5 Loop Back Count:     1
      Far End F4 Loop Back Count:     0
0
 
LVL 10

Expert Comment

by:kyleb84
ID: 22828058
Is it possible the router is set to "oversubscribe" the link?
0
 
LVL 25

Author Comment

by:Fred Marshall
ID: 22831975
Thank you!
I'm not sure what router you're referring to.  Here is the setup:

ISP gateway to assigned block of IP addresses.
V
PPPOA link
V
Local modem
V
"Internet Switch"  Linksys SWR208 managed switch.
V----------------V----------------V
Firewall-------VPN-------------VPN <<<<<public IP addresses on upstream side
Juniper--------RV042----------Cisco
SSG5

(The VPNs are on separate devices for a number of reasons but conceptually could be implemented in the Juniper SSG5).

Now, it seems like we already had the problem .. but I can't recall for sure .. so we replaced our original "internet switch " with the SWR208 managed switch so we could monitor things.  Never did see dropped packets there.  But, the problem was clearly apparent (yet intermittent from site to site and from month to month) after installing these switches in place of "dumb" switches.
The VPNs don't have much traffic and don't seem to correlate with the problem.
It's the firewall upstream traffic that can be demonstrated to cause trouble.
The Juniper firewalls were introduced  somewhat recently too.  So maybe that would be the router you're referring to.

I don't know what "oversubscribe" means exactly so I don't know where to look.  In concept, OK, but in practice what other terms / settings might attach to that idea?  Or, where would you look in an SSG5 if you happen to know that one? I've certainly not deviated from the default settings in terms of QoS or whatever else it might be.


0
 
LVL 25

Accepted Solution

by:
Fred Marshall earned 0 total points
ID: 22861378
Oversubscription would be something the ISP or telco would do.  Of course they do it - calling it "statistical multiplexing".  At least on the ISP side that applies to a very fat pipe that's actually loaded at 50% I'm told.  We don't know about the telco side but they have said it's nothing they've ever had to deal with like this.

Our application isn't much worse than a home situation would be .. with a long upload.
0
 
LVL 9

Expert Comment

by:Press2Esc
ID: 23685166
I dont think it is a DSL line capacity issue... because your snr & attentuation stats look great.
With the exception of AR AAL5 (ATM Layer5) Stats error count of 7243 under Rx Total Error, everything looks in order.    Also, not sure of the ramifications of running a PPPoA session over an ethenet connection.  
I am beginning to suspect some excessive LAN traffic effecting the over-all availability of WAN bandwidth that is being provided to your via your ISP...  Kinda like the way a P2P or some trojans can eat up network bandwidth..   I would start checking firewalls port and log file in search of for unwanted / unknown traffic...   P2E
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Messaging apps are amazing tools with the power to do a lot of good, but the truth is the process of collaborating with coworkers requires relationships established through meaningful communication - the kind of communication that only happens face-…
When it comes to security, there are always trade-offs between security and convenience/ease of administration. This article examines some of the main pros and cons of using key authentication vs password authentication for hosting an SFTP server.
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now