Link to home
Start Free TrialLog in
Avatar of agilitycg
agilitycg

asked on

Flapping PTP T1 on Cisco 1721's

This is driving me nuts! I have a PTP T1 between two Cisco 1721's, each with a WIC-DSU-T1 card (V1, not V2). Both sides have been configured with T1 defaults (ESF framing, b8zs linecoding, timeslots 1-24). One site is set to internal clock source, and one site is set to line, and both are configured for PPP encapsulation.

Now, I get calls from the contracted tech supporting these routers and he says the T1 goes down about once an hour for a period of about 15 seconds. When I remote in, I see lots of Input Errors, all in the form of CRC, Frame, Abort, Interface Reset and Carrier Transitions. Now, from experience and research, these are the steps I have taken to isolate the problem.

1) Performed numerous hardware loopback tests on each CSU/DSU using extended pings tests. All ping tests had 1500 byte sized packets, various datagram patterns (0xffff, 0x0000, 0x0101, etc), each test consisting of about 2500 pings. Both WIC's tested clean.
2) Had the contracted tech test each cable, both came back 100%. They are solid copper shielded twisted pair, about 10 feet a piece.
3) Had the telco do their tests. Don't know what they did but they claim they are clean on their side.

Cisco's serial line troubleshooting doc indicates that these errors are produced from mismatching is framing and clocking configurations, or bad cards, or bad cables, etc. I have eliminated all of these as a possible cause. I cannot seem to track down the cause. When I do test and stress the circuit, I cannot reproduce any errors.

Any ideas as to what I am missing? Do I need to specify the clock rate on the router that is set to internal clocking?
Avatar of mgob
mgob
Flag of United States of America image

Carrier transition means a hard bounce on the line normally.  Did the telco check both the NIUs for trouble?
ASKER CERTIFIED SOLUTION
Avatar of mgob
mgob
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of agilitycg
agilitycg

ASKER

Hey mgob, thanks for the quick reply.

I am not sure what the tests they performed were, but I will definitely request that they check what you suggested, and if they already have tested it, I will post the results here.
Excellent, you can also ask them to run a test late at night where they loop up your CPE and run from the NIU / SPAN to the CO.  if you say it drops every hour, have them do this from say 10p->11p and see if it fails.  
both t1s should be internal clock line
bkepford,

Are you sure? Everything I have read, and in my experience, if you have both set to line or both set to internal, you will get clock mismatches and even more severe flapping. Am I mistaken?
agilitycg, I've never heard of internal clock line.... there is internal... and there is line... but not both?
mgob,

I think he was referring to setting them both to clock source internal... I think anyway. Either way, am I correct in that if both are set to the same value, there will be clocking mismatches?
Yes because if you set both to internal there clocks will never match and you'll get clock skew out the ying yang.

This is a p2p t1, meaning the carrier *does not* provide clock, you have to on one of the routers just like you are.

Plus clock errors wouldn't account for carrier transitions ;)
err I mean clock source line
yeah you are right on the clocking wasn't paying attention to the big PTP. Even though you had the cables tested I would replace what I could. Lose connections can test good but come out bad.
Thanks for the suggestions guys. I am working with the telco, getting them to test and relay the results to me. When I have updates I will post them here so that we can move forward on resolving this thread.

Again, thank you all :)
Here is the response from the telco. I know some of the location terminology will mean nothing to you but I am leaving it in an effort the keep the response complete.

"Our field technician and AT&T tested the physical circuit as it comes in and out of our collocation facility.   We tested towards the Bronze Way location because AT&T saw errors towards that location.  After a bit of plugging and unplugging connectors errors cleared.  While we had both AT&T and our technician in place we tested back towards Marvin B Love as well and found no issues.

As a just in case test I had the AT&T tech perform what he called a round robin test.  As the circuit we tested is a Point to Point circuit he was able to loop up the smartjacks at both locations and run a pattern test to the loop that was formed by having both smartjacks looped up.  After 15 minutes, there were no errors reported.  

AT&T at this time states the physical circuits to both locations and through our network are clean."

Since they did find a few errors that have been cleared on their end, I am going to clear the counters on these routers and see if the issue is gone. I have a feeling that their 15 minute test wasn't sufficient because the flapping happens at a greater interval than 15 minutes.
Thought I would mention this to see if it means anything signifigant. After I arrived at the office today, I cleared the counters on both routers to get accurate readings throughout the day. 15 minutes or so after I did this, I did a show interface serial 0 on both ends, with both showing zero errors. I then did a show service-module serial 0 on both ends, one end was clean of errors, and one end had the following:

Total Data (last 96 15 minute intervals):
    0 Line Code Violations, 3586 Path Code Violations
    257 Slip Secs, 3742 Fr Loss Secs, 0 Line Err Secs, 121 Degraded Mins
    142 Errored Secs, 85 Bursty Err Secs, 102 Severely Err Secs, 3674 Unavail Secs


Does this data typically clear after a clear counters command is given, or does it remain? If it remains, then this might be significant in that these errors only show up on one end of the PTP.
I would think so, however I don't have a router of that model to test on at the moment.  How have things been since the telco came out?
They didn't clear.  Notice how you said you did a clear counters and 15 minutes later you checked?  the router still has stats for 96 15 minute periods.  That means those stats are 24hrs old.

Mgob,

Issue seems to have stabilized. I appreciate your help greatly, and I feel the issue was on the telcos end. Thank you for your help.
Not a problem :)  Have a good weekend!