We help IT Professionals succeed at work.

"All circuits are busy" message received randomly when dialing out from Trixbox 2.8.

OAC Technology
on
We have a server running Trixbox 2.8 / Asterisk 1.6 and when we dial out, we will randomly get a message stating "All circuits are busy now."  This seems to happen randomly and if we redial the same number after receiving this message, the call almost always goes through without a problem.

We have two separate trunks set up using two different methods as posted below.  Both of these trunks have these random failures.  The service provider says they usually see a 503 message on their end, or nothing at all.  I get this message from the Asterisk server: == Everyone is busy/congested at this time (1:0/1/0)

The provider did some more digging and found that the sip headers are sometimes being passed incorrectly.  
Example of incorrect header being sent:
From: <sip:x.x.x.x:5060>;tag=31d8e646+1+0+71f506db
To: <sip:x.x.x.x:5060>;tag=as7e7c519b
Example of correct header being sent:
From: <sip:"CallerID”5555551000@x.x.x.x:5060>;tag=31d8e646+1+0+71f506db
To: <sip:x.x.x.x:5060>;tag=as7e7c519b

The test trunks that we have set up are below as well:

[TEST_TRUNK_1]
disallow=all
allow=ulaw
canreinvite=no
dtmfmode=auto
context=from-trunk
fromdomain=x.x.x.x
host=x.x.x.x
nat=never
insecure=invite
outboundproxy=x.x.x.x
qualify=no
srvlookup=no
type=peer
secret=secretive
username=##########
defaultexpirey=3600

[TEST_TRUNK_2]
host=x.x.x.x
type=peer
dtmfmode=auto

[TEST_TRUNK_2_USER]
host=x.x.x.x
type=peer
dtmfmode=auto

Has anyone else had these problems before?  Is there something I need to change on our system?
Comment
Watch Question

Ron MalmsteadInformation Services Manager

Commented:
Sounds more like a connectivity issue to me, rather than a configuration issue.
..otherwise, in most cases it wouldn't work at all...rather than not working randomly.

Is the trunk created between equipment on the same LAN, or is it going over the WAN ?
Do you have enough bandwidth ?
I notice you are using ulaw, which is known to use more bandwidth than other codecs.
Ron MalmsteadInformation Services Manager

Commented:
..also, on the other end of that trunk, pstn side, do you have enough channels to accomodate your simultaneous incoming and outgoing calls ?
OAC TechnologyProfessional Nerds

Author

Commented:
The trunk is going over the WAN through a dedicated T1 circuit.  Bandwidth doesn't seem to be an issue, nor does channel capacity.  I have made at least 6 simultaneous calls to test the channels out with no issue, and the all circuits busy message can happen even when it is just one person using the phones.
Ron MalmsteadInformation Services Manager

Commented:
" Bandwidth doesn't seem to be an issue "
How are you verifying that ?

For instance, you could have only one call active, meanwhile one of your users is downloading some streaming media or other stuff you don't know about, while that call is taking place.

Do you have QOS to ensure that internet traffic cannot take precedence over your voice traffic ?

Also, on the other end of that WAN, is that connection used only for voice or both internet traffic and voice ?

On the Asterisk CLI... if you type.. "SIP SHOW PEERS"  you should see the trunk peer, status (OK)...and then it shows the "milliseconds" for the peer replies.   A high number.... e.g  "380ms", could indicate that the trunk peer traffic isn't getting through fast enough.

I think you'll have a better success rate on a 1.5 T1, if you used a smaller codec as well.
Without QOS however, the possiblity always exists that other internet traffic can steal bandwidth from your voice traffic.
OAC TechnologyProfessional Nerds

Author

Commented:
The 1.5 T1 is only used for the phone system.

When I type SIP SHOW PEERS, all of my trunks show a status of "Unmonitored" rather than a latency time.  Is there a way to change this?

Which codec would you recommend?

Thanks for the help
Ron MalmsteadInformation Services Manager

Commented:
unmonitored ?....hmmmm... thinking where i've seen that happen before.

Give me a couple minutes, i'm gonna look something up on digium.
Information Services Manager
Commented:
G729a is the best codec to use to get quality and minimal bandwidth usage...but it requires licenses for each channel.  $10/channel.

GSM codec is what i'm using, and it's quality is "fair"...bandwidth usage is very low...but it requires a little bit more CPU processing to transcode.  No license required.

There's also Speex, which I haven't had a lot of experience with myself...but is also low bandwidth and no license required.

Then there's iLBC, which i've never used either.


.....being that i'm using GSM with no issues, I guess I would recommend that one first.
Ron MalmsteadInformation Services Manager

Commented:
Change to ....  qualify=yes


That's why the peer is showing Unmonitored.

...If you set ... qualify=3000
It will send a "i'm still here" packet every 3 seconds.

OAC TechnologyProfessional Nerds

Author

Commented:
The highest latency I've seen is 80ms. It is usually around 30-40ms.
Ron MalmsteadInformation Services Manager

Commented:
80 isn't high enough latency to produce this issue...in my opinion.

Can you confirm that it's still occurring ?

What, if any, other trouble shooting have you done ?

For example... do you have any errors posting on the CLI on either end when a call fails to go through ?

Most Valuable Expert 2012
Commented:
I agree with xuserx2000. 80 just isn't high enough.

Based on what I am reading, I am assuming the following givens, correct me if I am wrong.

1. Trixbox is using a stable, known good version of Asterisk.
2. You are using SIP trunking over a T1, and are not using more than 6 ulaw channels at once.
3. The problem is intermittent.

Because of #1, I am going to dismiss the notion that it is a bug in Asterisk, and assume that all the Asterisk code is reading, parsing, and reacting to SIP properly.

Because of #2, and the 80 qualifying time, I am going to assume this is not a bandwidth problem. However, it would be prudent to ensure you don't have something on the network chewing up bandwidth at will like bit torrent or something else.

So we are left with reasons why the problem is intermittent.

A SIP 503 is generated when a proxy or gateway fails. It can be because the server is overloaded or failed.  Since you are seeing the 503, we can assume that the busy server is sending a retry-after header. Is it? If not, then you should be seeing a 500. Either way, that points to the provider.

On the client side, if you are recieving a 503 error, that means you cannot connect up stream (still the provider's problem), but you should attempt to forward the request off to another server. Ergo, you should have secondary and tertiary server through which you can make the call request. Do you?

Transaction layer errors often show up as 503's because the 503 error has been chosen as the catch all for the transaction layer. The 503 has found substantial use in indicating failure or overload conditions in proxies.  This requires somewhat special treatment.       Specifically, receipt of a 503 should trigger an attempt to contact the next element in the result of a DNS SRV lookup.  Also, 503 response is only forwarded upstream by a proxy under certain conditions.

So, regarding the outbound proxy you have set in your sip.conf above. Is it necessary? Is that your proxy or theirs? What happens when you take it out?
OAC TechnologyProfessional Nerds

Author

Commented:
We decided to go with PRI and it solved the issue.